A. Seza Doğruöz


Open Machine Translation for Low Resource South American Languages (AmericasNLP 2021 Shared Task Contribution)
Shantipriya Parida | Subhadarshi Panda | Amulya Dash | Esau Villatoro-Tello | A. Seza Doğruöz | Rosa M. Ortega-Mendoza | Amadeo Hernández | Yashvardhan Sharma | Petr Motlicek
Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas

This paper describes the team (“Tamalli”)’s submission to AmericasNLP2021 shared task on Open Machine Translation for low resource South American languages. Our goal was to evaluate different Machine Translation (MT) techniques, statistical and neural-based, under several configuration settings. We obtained the second-best results for the language pairs “Spanish-Bribri”, “Spanish-Asháninka”, and “Spanish-Rarámuri” in the category “Development set not used for training”. Our performed experiments will serve as a point of reference for researchers working on MT with low-resource languages.

A Survey of Code-switching: Linguistic and Social Perspectives for Language Technologies
A. Seza Doğruöz | Sunayana Sitaram | Barbara E. Bullock | Almeida Jacqueline Toribio
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

The analysis of data in which multiple languages are represented has gained popularity among computational linguists in recent years. So far, much of this research focuses mainly on the improvement of computational methods and largely ignores linguistic and social aspects of C-S discussed across a wide range of languages within the long-established literature in linguistics. To fill this gap, we offer a survey of code-switching (C-S) covering the literature in linguistics with a reflection on the key issues in language technologies. From the linguistic perspective, we provide an overview of structural and functional patterns of C-S focusing on the literature from European and Indian contexts as highly multilingual areas. From the language technologies perspective, we discuss how massive language models fail to represent diverse C-S types due to lack of appropriate training data, lack of robust evaluation benchmarks for C-S (across multilingual situations and types of C-S) and lack of end-to- end systems that cover sociolinguistic aspects of C-S as well. Our survey will be a step to- wards an outcome of mutual benefit for computational scientists and linguists with a shared interest in multilingualism and C-S.

How “open” are the conversations with open-domain chatbots? A proposal for Speech Event based evaluation
A. Seza Doğruöz | Gabriel Skantze
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Open-domain chatbots are supposed to converse freely with humans without being restricted to a topic, task or domain. However, the boundaries and/or contents of open-domain conversations are not clear. To clarify the boundaries of “openness”, we conduct two studies: First, we classify the types of “speech events” encountered in a chatbot evaluation data set (i.e., Meena by Google) and find that these conversations mainly cover the “small talk” category and exclude the other speech event categories encountered in real life human-human communication. Second, we conduct a small-scale pilot study to generate online conversations covering a wider range of speech event categories between two humans vs. a human and a state-of-the-art chatbot (i.e., Blender by Facebook). A human evaluation of these generated conversations indicates a preference for human-human conversations, since the human-chatbot conversations lack coherence in most speech event categories. Based on these results, we suggest (a) using the term “small talk” instead of “open-domain” for the current chatbots which are not that “open” in terms of conversational abilities yet, and (b) revising the evaluation methods to test the chatbot conversations against other speech events.


Proceedings of the Second Workshop on NLP and Computational Social Science
Dirk Hovy | Svitlana Volkova | David Bamman | David Jurgens | Brendan O’Connor | Oren Tsur | A. Seza Doğruöz
Proceedings of the Second Workshop on NLP and Computational Social Science

Integrating Meaning into Quality Evaluation of Machine Translation
Osman Başkaya | Eray Yildiz | Doruk Tunaoğlu | Mustafa Tolga Eren | A. Seza Doğruöz
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Machine translation (MT) quality is evaluated through comparisons between MT outputs and the human translations (HT). Traditionally, this evaluation relies on form related features (e.g. lexicon and syntax) and ignores the transfer of meaning reflected in HT outputs. Instead, we evaluate the quality of MT outputs through meaning related features (e.g. polarity, subjectivity) with two experiments. In the first experiment, the meaning related features are compared to human rankings individually. In the second experiment, combinations of meaning related features and other quality metrics are utilized to predict the same human rankings. The results of our experiments confirm the benefit of these features in predicting human evaluation of translation quality in addition to traditional metrics which focus mainly on form.


Proceedings of the First Workshop on NLP and Computational Social Science
David Bamman | A. Seza Doğruöz | Jacob Eisenstein | Dirk Hovy | David Jurgens | Brendan O’Connor | Alice Oh | Oren Tsur | Svitlana Volkova
Proceedings of the First Workshop on NLP and Computational Social Science

Survey: Computational Sociolinguistics: A Survey
Dong Nguyen | A. Seza Doğruöz | Carolyn P. Rosé | Franciska de Jong
Computational Linguistics, Volume 42, Issue 3 - September 2016


Predicting Code-switching in Multilingual Communication for Immigrant Communities
Evangelos Papalexakis | Dong Nguyen | A. Seza Doğruöz
Proceedings of the First Workshop on Computational Approaches to Code Switching

Predicting Dialect Variation in Immigrant Contexts Using Light Verb Constructions
A. Seza Doğruöz | Preslav Nakov
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Why Gender and Age Prediction from Tweets is Hard: Lessons from a Crowdsourcing Experiment
Dong Nguyen | Dolf Trieschnigg | A. Seza Doğruöz | Rilana Gravel | Mariët Theune | Theo Meder | Franciska de Jong
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

Modeling the Use of Graffiti Style Features to Signal Social Relations within a Multi-Domain Learning Paradigm
Mario Piergallini | A. Seza Doğruöz | Phani Gadde | David Adamson | Carolyn Rosé
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics


Word Level Language Identification in Online Multilingual Communication
Dong Nguyen | A. Seza Doğruöz
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing