Projects | Uthayasanker Thayasivam

Multilingual Speech Emotion Recognition


Description

Speech is a powerful medium that carries not only linguistic content but also paralinguistic cues like emotion and speaker identity. While Speech Emotion Recognition (SER) systems have seen significant progress in high-resource, monolingual settings, their applicability in multilingual contexts—especially for low-resource languages—remains limited. Dravidian languages such as Tamil, Telugu, Kannada, and Malayalam are widely spoken but severely underrepresented in SER research. This lack of representation restricts the development of inclusive emotion-aware systems, particularly in regions where these languages are dominant.

Motivation and Research Objectives
The primary motivation behind this project is to bridge the resource and performance gap in SER for underrepresented languages, with a focus on Tamil. This is addressed through three key research objectives:

  1. Dataset Creation – To build a high-quality Tamil emotional speech dataset that includes core emotion classes like Happy, Sad, Angry, Fear, and Neutral.

  2. Multilingual Benchmarking – To conduct a large-scale survey of global SER datasets across the top 70 most spoken languages, identifying usable emotional speech corpora in 29 languages.

  3. Model Development – To design and train a multilingual SER model that supports Tamil and other Dravidian languages, while maintaining efficiency and cross-lingual generalization suitable for real-world, resource-constrained applications.

Result and Impact

Performance Summary
The experimental results confirm that KuralNet offers strong multilingual generalization and excels particularly in low-resource Dravidian languages. Among the 13 supported languages, KuralNet achieved the highest macro F1-scores and weighted accuracy in Tamil, Kannada, and Malayalam, outperforming established baselines such as Emotion2Vec-Large, XGBoost, and Random Forest. The most notable performance gain was in Kannada, where KuralNet exceeded the macro F1-score of Emotion2Vec-Large by +0.55, showcasing its capability to learn robust emotional patterns even in settings with limited annotated data.

In addition, while models like Emotion2Vec-Large performed well in some Indo-European languages such as Italian and Spanish, KuralNet maintained stable and competitive performance across all languages. This balance of accuracy and computational efficiency, especially when using the Whisper-small backbone, demonstrates the practicality of KuralNet for real-world multilingual applications.

Dataset and Benchmark Contributions
This project introduced two major contributions to the research community:

  • EmoTa Dataset: The first structured and culturally grounded Tamil emotional speech corpus, with high inter-annotator agreement (Fleiss’ Kappa = 0.74), provides a crucial resource for future emotion recognition research focused on Tamil speakers in Sri Lanka and beyond.

  • KuralHub Benchmark: A comprehensive survey and categorization of emotional speech datasets across 29 languages, enabling cross-lingual benchmarking and comparative studies in SER research. The summary table and dataset categorization improve accessibility for researchers working on multilingual or low-resource emotion recognition.

Broader Impact
The outcomes of this work have significant implications for both academic research and real-world applications:

  • For AI Researchers and Developers: KuralNet provides a scalable baseline for multilingual SER and sets a new standard for performance in Dravidian and other low-resource languages. It also encourages the use of hybrid feature fusion and adaptive attention techniques in emotion modeling.

  • For Industry Applications: By supporting languages like Tamil, Kannada, and Malayalam, KuralNet can be integrated into call centers, mental health tools, language learning platforms, and virtual assistants, enhancing their emotional intelligence and inclusivity.

  • For Future Work: The framework supports expansion to additional languages, with plans to scale to 29 languages. This positions KuralNet as a flexible foundation for emotion-aware systems across multilingual and multicultural settings.

Awards and Recognition

Publication at CHiPSAL 2025
The outcomes of this research were formally recognized through acceptance at a prestigious venue:

  • Title: EmoTa: A Tamil Emotional Speech Dataset

  • Authors: Jubeerathan Thevakumar, Luxshan Thavarasa, Thanikan Sivatheepan, Sajeev Kugarajah, and Uthayasanker Thayasivam

  • Conference: First Workshop on Challenges in Processing South Asian Languages (CHiPSAL 2025), co-located with a major international conference in computational linguistics

  • Date: January 19, 2025

  • Pages: 193–201

  • Published by: International Committee on Computational Linguistics

  • Paper link: aclanthology.org/2025.chipsal-1.19.pdf

  • Poster link: Google Drive Poster

Academic Recognition
The paper was presented to a global audience of researchers and practitioners working on computational linguistics for South Asian languages. The dataset and findings received positive feedback for addressing the underrepresentation of Tamil and other Dravidian languages in Speech Emotion Recognition (SER) research. This contribution was highlighted as a novel step toward inclusive AI systems.

Team Members

Luxshan Thavarasa

Focused on Speech Emotion Recognition (SER) for low-resource languages. Bridging deep learning with full-stack development.
luxshanlux2000@gmail.com
Google Scholar | LinkedIn | GitHub


Jubeerathan Thevakumar

Works on optimizing SER models. Skilled in backend engineering and inference systems.
jubeerathan@gmail.com
Google Scholar | LinkedIn | GitHub


Thanikan Sivatheepan

Handles preprocessing pipelines and designs dashboards for SER data visualization.
thanikansiv@gmail.com
Google Scholar | LinkedIn | GitHub