Description:
The Tamil Emotional Speech Dataset consists of 936 utterances from 22 native Sri Lankan Tamil speakers (11 male, 11 female). Each speaker expresses five emotions: Anger, Happiness, Sadness, Fear, and Neutrality. These emotions are conveyed through 19 semantically neutral sentences, making the dataset particularly valuable for research in Speech Emotion Recognition (SER). Additionally, the dataset reflects the linguistic diversity inherent in the Tamil dialects of Sri Lanka, offering a comprehensive representation of regional and gender-based variations.
Emotions Represented:
- Anger
- Happiness
- Sadness
- Fear
- Neutrality
Speakers:
- 22 native Sri Lankan Tamil speakers (11 male, 11 female)
Total Utterances:
- 936 utterances, with each emotion represented by multiple sentences per speaker.
Linguistic Focus:
The dataset captures the diverse Tamil dialects spoken across Sri Lanka, enhancing its potential for diverse speech-emotion recognition tasks.
Conference & Workshop Information
-
Conference:
Co-located with the 31st International Conference on Computational Linguistics (COLING 2025) -
Workshop:
CHiPSAL: Challenges in Processing South Asian Languages
Dataset Access & License
-
GitHub Repository:
EmoTa GitHub Repository -
License:
The dataset is licensed under the EmoTa Academic-Commercial License (EACL), an extended version of the CC BY-NC 4.0 license. You can access the full license terms here. - Dataset Download: here