English to Sinhala Neural Machine Translation

The 54k government document dataset from the CSE department was used. The domain is Sri Lankan official government documents. Most of the Sinhala sentences belong to the literacy language. The total number of sentence pairs is 58,140.


Associated Publication: -

We used the 54k government document dataset from the CSE department.

  • Domain : Sri Lankan official government documents. Most of the Sinhala sentences belong to the literacy language
  • Paper Title: English to Sinhala Neural Machine Translation
  • Published in: 2020 International Conference on Asian Language Processing (IALP), Kuala Lumpur
  • Date of Conference: 4-6 Dec. 2020
  • DOI: 10.1109/IALP51396.2020.9310462


Cite the following papers in your publication

author={Fonseka, Thilakshi and Naranpanawa, Rashmini and Perera, Ravinga and Thayasivam, Uthayasanker}, 
booktitle={2020 International Conference on Asian Language Processing (IALP)},   
title={English to Sinhala Neural Machine Translation},   
year={2020},  volume={}, 

Dataset Download
Request for download here (Link).