Ruvan Weerasinghe


pdf bib
Sinhala-Tamil Machine Translation: Towards better Translation Quality
Randil Pushpananda | Ruvan Weerasinghe | Mahesan Niranjan
Proceedings of the Australasian Language Technology Association Workshop 2014


pdf bib
Collaboratively Building Language Resources while Localising the Web
Asanka Wasala | Reinhard Schäler | Ruvan Weerasinghe | Chris Exton
Proceedings of the 3rd Workshop on the People’s Web Meets NLP: Collaboratively Constructed Semantic Resources and their Applications to NLP

pdf bib
Proceedings of the 10th Workshop on Asian Language Resources
Ruvan Weerasinghe | Sarmad Hussain | Virach Sornlertlamvanich | Rachel Edita O. Roxas
Proceedings of the 10th Workshop on Asian Language Resources


pdf bib
Corpus-based Sinhala Lexicon
Ruvan Weerasinghe | Dulip Herath | Viraj Welgama
Proceedings of the 7th Workshop on Asian Language Resources (ALR7)


pdf bib
NLP Applications of Sinhala: TTS & OCR
Ruvan Weerasinghe | Asanka Wasala | Dulip Herath | Viraj Welgama
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II


pdf bib
Sinhala Grapheme-to-Phoneme Conversion and Rules for Schwa Epenthesis
Asanka Wasala | Ruvan Weerasinghe | Kumudu Gamage
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions


pdf bib
A Rule Based Syllabification Algorithm for Sinhala
Ruvan Weerasinghe | Asanka Wasala | Kumudu Gamage
Second International Joint Conference on Natural Language Processing: Full Papers


pdf bib
Bootstrapping the lexicon building process for machine translation between ‘new’ languages
Ruvan Weerasinghe
Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: Technical Papers

The cumulative effort over the past few decades that have gone into developing linguistic resources for tasks ranging from machine readable dictionaries to translation systems is enormous. Such effort is prohibitively expensive for languages outside the (largely) European family. The possibility of building such resources automatically by accessing electronic corpora of such languages are therefore of great interest to those involved in studying these ‘new’ - ‘lesser known’ languages. The main stumbling block to applying these data driven techniques directly is that most of them require large corpora rarely available for such ‘new’ languages. This paper describes an attempt at setting up a bootstrapping agenda to exploit the scarce corpus resources that may be available at the outset to a researcher concerned with such languages. In particular it reports on results of an experiment to use state-of-the-art data-driven techniques for building linguistic resources for Sinhala - a non-European language with virtually no electronic resources.