1. Banking Domain based Speech Corpus:
The scope of existing Sinhala speech corpora is not extensible to model a conversation between a human and an agent. Thus, a novel Sinhala speech corpus is built with the intention of developing a Sinhala speech dialog system. The banking domain was selected, few conversations between a customer and a customer service assistant of a bank during the process of opening a new bank account were analyzed. 14 basic intentions that a customer would express during this common conversation were identified.
First, a crowdsourcing approach was used to identify different inflections on how each of the intents would be uttered in spoken Sinhala language. A Google form including the 14 predefined intents was distributed among 130 participants covering different age groups. They were requested to provide alternative ways in which people could express each of these intents in spoken Sinhala language. In addition, participants were selected to capture the different dialects of the Sinhala language spoken in different regions of the country.
The data were analyzed and a finalized set of inflections corresponding to each intent was created with the help of language experts.
Voicer, a web/smartphone based crowdsourcing tool was used to collect speech samples. The tool was re-configured to capture balanced amounts of speech clips for each intent. Multiple users can simultaneously access the tool and record their voice by uttering inflection commands prompted by the tool. The data collection process was conducted under uncontrolled environmental conditions.
Using Voicer, we collected a total of 9650 speech clips for all infections under the 14 intents. The data was collected from 120 speakers representing 60% males and 40% females. 30% of total speakers were university students and the rest from the general community within the age group of 25 to 60 years. The average length of an individual recording ranges from one to three seconds. These speech clips were validated manually and subjected to noise removal. After removing all flawed clips with over recordings, halfway-stopped, and high noise profiles, 8977 speech clips were shortlisted to build the corpus. The final Sinhala speech corpus was 4.15 hours long. The corpus was divided into training and testing set with 80 %, 20% ratio respectively.
Total speech clips for the Inflections in Sinhala: -
[1]Request to open a new bank account: - 9 739
[2] Request a Savings Account: - 8 672
[3]Request a Fixed Deposit: - 8 743
[4] Ask for a Savings Account type: - 4 473
[5] Making a choice from one to five: - 5 531
[6] Request for a new attempt: - 12 724
[7] Reject the next attempt: - 12 804
[8] Give confirmation as correct details: - 11 842
[9] Give confirmation as wrong details: - 10 726
[10] Give confirmation to continue a task: - 5 502
[11] Request to terminate a task: - 8 646
[12] Selecting a fixed deposit period: - 6 585
[13] Request interest at maturity: - 5 481
[14] Request interest monthly: - 5 509
2. Banking Domain based Text Data Corpus:
To train the classification models a seperate text data corpus was collected.
A google form containing the identified 14 intents was distributed among university students and some people representing the general community. The participants were asked to provide a way to interpret each intent. A total of 130 responses were received.
Associated Publication: -
Paper title: A Speech Command Classification System for Sinhala Language based on Automatic Speech Recognition
Published in: 2019 International Conference on Asian Language Processing (IALP), Shanghai
Date of Conference: 15-17 Nov. 2019
DOI: 10.1109/IALP48816.2019.9037648
Citation: -
T. Dinushika, L. Kavmini, P. Abeyawardhana, U. Thayasivam and S. Jayasena, "Speech Command Classification System for Sinhala Language based on Automatic Speech Recognition," 2019 International Conference on Asian Language Processing (IALP), 2019, pp. 205-210, doi: 10.1109/IALP48816.2019.9037648.