Every single task in the speech domain (speaker identification, emotion detection, etc) is growing individually as different research people are doing research works for one or two tasks where speaker embedding is generated individually by each research group, starting from the base of the pipeline. This is a very time consuming and inefficient approach. Since embedding voice is common for all the tasks, it will be better if there is a generic voice embedding where the characteristics of speech are captured and the mapping between those characteristics and the encoded vector is known. In this work, we are going to do an in-depth investigation on speaker embedding and learn a set of high-level feature representations through deep learning.
Lakshika Sithamparanathan
Final year Computer Science and Engineering undergraduate, extremely motivated to constantly grow professionally. Passionate with the appetite for machine learning and artificial intelligence.
I’m working on the project which is High-level voice feature extraction.