The 54k government document dataset from the CSE department was used. The domain is Sri Lankan official government documents. Most of the Sinhala sentences belong to the literacy language. The total number of sentence pairs is 58,140.
Associated Publication: -
We used the 54k government document dataset from the CSE department.
- Domain : Sri Lankan official government documents. Most of the Sinhala sentences belong to the literacy language
- Paper Title: English to Sinhala Neural Machine Translation
- Published in: 2020 International Conference on Asian Language Processing (IALP), Kuala Lumpur
- Date of Conference: 4-6 Dec. 2020
- DOI: 10.1109/IALP51396.2020.9310462
Citations
Cite the following papers in your publication
@INPROCEEDINGS{9310462,
author={Fonseka, Thilakshi and Naranpanawa, Rashmini and Perera, Ravinga and Thayasivam, Uthayasanker},
booktitle={2020 International Conference on Asian Language Processing (IALP)},
title={English to Sinhala Neural Machine Translation},
year={2020}, volume={},
number={},
pages={305-309},
doi={10.1109/IALP51396.2020.9310462}}
Dataset Download
Request for download here (Link).