Metaphor Detection for Low Resource Languages – Tamil
No images available for this project
Description
This research addresses the detection and extraction of metaphors in Tamil, a low-resource language with little to no prior work in this area. Metaphors enrich language with deeper meaning, yet detecting them computationally remains a challenge—especially in languages like Tamil that lack annotated metaphor datasets. This project introduces the first metaphor detection approach for Tamil, beginning with corpus creation and leveraging both traditional machine learning and transformer-based models. The goal is to improve NLP applications such as translation, summarization, creative writing, and AI-generated content in Tamil.
Motivation and Research Objective
Motivation
Metaphor detection is crucial for natural language understanding but remains underexplored—especially in low-resource languages. Existing research focuses almost entirely on English due to the availability of annotated corpora. This project aims to expand metaphor detection into Tamil, enabling future research and development in Tamil NLP.
Objectives
-
Corpus Creation: Build a small-scale Tamil metaphor dataset (~500 metaphorical and 500 non-metaphorical sentences) using Tamil song lyrics.
-
Binary Classification: Develop a model to classify sentences as metaphorical or not.
-
Target–Source Annotation: Create a dataset for extracting target and source metaphor components using a sequence-to-sequence model.
-
Tool Development: Build a metaphor extraction tool for Tamil based on the created dataset.
Impact
Our research will be the first significant contribution to metaphor detection in the Tamil language, a low-resource language with virtually no existing annotated corpora or models in this domain. By leveraging techniques and learnings from English-language metaphor detection, and applying them to pre-trained Tamil language models, we aim to achieve promising performance.
This work will:
-
Lay the foundation for future research in metaphor detection for Tamil and other low-resource languages.
-
Enable the development of NLP applications like Tamil metaphor-aware translation, summarization, and creative AI writing.
-
Offer insights into the effectiveness of different Tamil language models, allowing for performance benchmarking and model selection in Tamil NLP tasks.
Ultimately, our work encourages broader linguistic inclusivity in metaphor detection and helps bring underrepresented languages like Tamil into mainstream NLP research.
Contributor
Krishan Chavinda
Department of Computer Science and Engineering,
University of Moratuwa, Sri Lanka.