Shuly Wintner


2021

pdf bib
Machine Translation into Low-resource Language Varieties
Sachin Kumar | Antonios Anastasopoulos | Shuly Wintner | Yulia Tsvetkov
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

State-of-the-art machine translation (MT) systems are typically trained to generate “standard” target language; however, many languages have multiple varieties (regional varieties, dialects, sociolects, non-native varieties) that are different from the standard language. Such varieties are often low-resource, and hence do not benefit from contemporary NLP solutions, MT included. We propose a general framework to rapidly adapt MT systems to generate language varieties that are close to, but different from, the standard target language, using no parallel (source–variety) data. This also includes adaptation of MT systems to low-resource typologically-related target languages. We experiment with adapting an English–Russian MT system to generate Ukrainian and Belarusian, an English–Norwegian Bokmål system to generate Nynorsk, and an English–Arabic system to generate four Arabic dialects, obtaining significant improvements over competitive baselines.

2019

pdf bib
Topics to Avoid: Demoting Latent Confounds in Text Classification
Sachin Kumar | Shuly Wintner | Noah A. Smith | Yulia Tsvetkov
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Despite impressive performance on many text classification tasks, deep neural networks tend to learn frequent superficial patterns that are specific to the training data and do not always generalize well. In this work, we observe this limitation with respect to the task of native language identification. We find that standard text classifiers which perform well on the test set end up learning topical features which are confounds of the prediction task (e.g., if the input text mentions Sweden, the classifier predicts that the author’s native language is Swedish). We propose a method that represents the latent topical confounds and a model which “unlearns” confounding features by predicting both the label of the input text and the confound; but we train the two predictors adversarially in an alternating fashion to learn a text representation that predicts the correct label but is less prone to using information about the confound. We show that this model generalizes better and learns features that are indicative of the writing style rather than the content.

pdf bib
Automatic Detection of Translation Direction
Ilia Sominsky | Shuly Wintner
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Parallel corpora are crucial resources for NLP applications, most notably for machine translation. The direction of the (human) translation of parallel corpora has been shown to have significant implications for the quality of statistical machine translation systems that are trained with such corpora. We describe a method for determining the direction of the (manual) translation of parallel corpora at the sentence-pair level. Using several linguistically-motivated features, coupled with a neural network model, we obtain high accuracy on several language pairs. Furthermore, we demonstrate that the accuracy is correlated with the (typological) distance between the two languages.

2018

pdf bib
Native Language Cognate Effects on Second Language Lexical Choice
Ella Rabinovich | Yulia Tsvetkov | Shuly Wintner
Transactions of the Association for Computational Linguistics, Volume 6

We present a computational analysis of cognate effects on the spontaneous linguistic productions of advanced non-native speakers. Introducing a large corpus of highly competent non-native English speakers, and using a set of carefully selected lexical items, we show that the lexical choices of non-natives are affected by cognates in their native language. This effect is so powerful that we are able to reconstruct the phylogenetic language tree of the Indo-European language family solely from the frequencies of specific lexical items in the English of authors with various native languages. We quantitatively analyze non-native lexical choice, highlighting cognate facilitation as one of the important phenomena shaping the language of non-native speakers.

pdf bib
Framing and Agenda-setting in Russian News: a Computational Analysis of Intricate Political Strategies
Anjalie Field | Doron Kliger | Shuly Wintner | Jennifer Pan | Dan Jurafsky | Yulia Tsvetkov
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Amidst growing concern over media manipulation, NLP attention has focused on overt strategies like censorship and “fake news”. Here, we draw on two concepts from political science literature to explore subtler strategies for government media manipulation: agenda-setting (selecting what topics to cover) and framing (deciding how topics are covered). We analyze 13 years (100K articles) of the Russian newspaper Izvestia and identify a strategy of distraction: articles mention the U.S. more frequently in the month directly following an economic downturn in Russia. We introduce embedding-based methods for cross-lingually projecting English frames to Russian, and discover that these articles emphasize U.S. moral failings and threats to the U.S. Our work offers new ways to identify subtle media manipulation strategies at the intersection of agenda-setting and framing.

pdf bib
Native Language Identification with User Generated Content
Gili Goldin | Ella Rabinovich | Shuly Wintner
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We address the task of native language identification in the context of social media content, where authors are highly-fluent, advanced nonnative speakers (of English). Using both linguistically-motivated features and the characteristics of the social media outlet, we obtain high accuracy on this challenging task. We provide a detailed analysis of the features that sheds light on differences between native and nonnative speakers, and among nonnative speakers with different backgrounds.

2017

pdf bib
Found in Translation: Reconstructing Phylogenetic Language Trees from Translations
Ella Rabinovich | Noam Ordan | Shuly Wintner
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Translation has played an important role in trade, law, commerce, politics, and literature for thousands of years. Translators have always tried to be invisible; ideal translations should look as if they were written originally in the target language. We show that traces of the source language remain in the translation product to the extent that it is possible to uncover the history of the source language by looking only at the translation. Specifically, we automatically reconstruct phylogenetic language trees from monolingual texts (translated from several source languages). The signal of the source language is so powerful that it is retained even after two phases of translation. This strongly indicates that source language interference is the most dominant characteristic of translated texts, overshadowing the more subtle signals of universal properties of translation.

pdf bib
Personalized Machine Translation: Preserving Original Author Traits
Ella Rabinovich | Raj Nath Patel | Shachar Mirkin | Lucia Specia | Shuly Wintner
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

The language that we produce reflects our personality, and various personal and demographic characteristics can be detected in natural language texts. We focus on one particular personal trait of the author, gender, and study how it is manifested in original texts and in translations. We show that author’s gender has a powerful, clear signal in originals texts, but this signal is obfuscated in human and machine translation. We then propose simple domain-adaptation techniques that help retain the original gender traits in the translation, without harming the quality of the translation, thereby creating more personalized machine translation systems.

2016

pdf bib
On the Similarities Between Native, Non-native and Translated Texts
Ella Rabinovich | Sergiu Nisioi | Noam Ordan | Shuly Wintner
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
A Corpus of Native, Non-native and Translated Texts
Sergiu Nisioi | Ella Rabinovich | Liviu P. Dinu | Shuly Wintner
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We describe a monolingual English corpus of original and (human) translated texts, with an accurate annotation of speaker properties, including the original language of the utterances and the speaker’s country of origin. We thus obtain three sub-corpora of texts reflecting native English, non-native English, and English translated from a variety of European languages. This dataset will facilitate the investigation of similarities and differences between these kinds of sub-languages. Moreover, it will facilitate a unified comparative study of translations and language produced by (highly fluent) non-native speakers, two closely-related phenomena that have only been studied in isolation so far.

pdf bib
Translationese: Between Human and Machine Translation
Shuly Wintner
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Tutorial Abstracts

Translated texts, in any language, have unique characteristics that set them apart from texts originally written in the same language. Translation Studies is a research field that focuses on investigating these characteristics. Until recently, research in machine translation (MT) has been entirely divorced from translation studies. The main goal of this tutorial is to introduce some of the findings of translation studies to researchers interested mainly in machine translation, and to demonstrate that awareness to these findings can result in better, more accurate MT systems.

2015

pdf bib
Statistical Machine Translation with Automatic Identification of Translationese
Naama Twitto | Noam Ordan | Shuly Wintner
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf bib
Unsupervised Identification of Translationese
Ella Rabinovich | Shuly Wintner
Transactions of the Association for Computational Linguistics, Volume 3

Translated texts are distinctively different from original ones, to the extent that supervised text classification methods can distinguish between them with high accuracy. These differences were proven useful for statistical machine translation. However, it has been suggested that the accuracy of translation detection deteriorates when the classifier is evaluated outside the domain it was trained on. We show that this is indeed the case, in a variety of evaluation scenarios. We then show that unsupervised classification is highly accurate on this task. We suggest a method for determining the correct labels of the clustering outcomes, and then use the labels for voting, improving the accuracy even further. Moreover, we suggest a simple method for clustering in the challenging case of mixed-domain datasets, in spite of the dominance of domain-related features over translation-related ones. The result is an effective, fully-unsupervised method for distinguishing between original and translated texts that can be applied to new domains with reasonable accuracy.

2014

pdf bib
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics
Shuly Wintner | Sharon Goldwater | Stefan Riezler
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics
Shuly Wintner | Marko Tadić | Bogdan Babych
Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics
Shuly Wintner | Desmond Elliott | Konstantina Garoufi | Douwe Kiela | Ivan Vulić
Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers
Shuly Wintner | Stefan Riezler | Sharon Goldwater
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers

pdf bib
Identification of Multiword Expressions by Combining Multiple Linguistic Information Sources
Yulia Tsvetkov | Shuly Wintner
Computational Linguistics, Volume 40, Issue 2 - June 2014

2013

pdf bib
Identifying the L1 of non-native writers: the CMU-Haifa system
Yulia Tsvetkov | Naama Twitto | Nathan Schneider | Noam Ordan | Manaal Faruqui | Victor Chahuneau | Shuly Wintner | Chris Dyer
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Improving Statistical Machine Translation by Adapting Translation Models to Translationese
Gennadi Lembersky | Noam Ordan | Shuly Wintner
Computational Linguistics, Volume 39, Issue 4 - December 2013

2012

pdf bib
Incorporating Linguistic Knowledge in Statistical Machine Translation: Translating Prepositions
Reshef Shilon | Hanna Fadida | Shuly Wintner
Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data

pdf bib
A Morphologically Annotated Hebrew CHILDES Corpus
Aviad Albert | Brian MacWhinney | Bracha Nir | Shuly Wintner
Proceedings of the Workshop on Computational Models of Language Acquisition and Loss

pdf bib
Language Models for Machine Translation: Original vs. Translated Texts
Gennadi Lembersky | Noam Ordan | Shuly Wintner
Computational Linguistics, Volume 38, Issue 4 - December 2012

pdf bib
Adapting Translation Models to Translationese Improves SMT
Gennadi Lembersky | Noam Ordan | Shuly Wintner
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

2011

pdf bib
Towards Modular Development of Typed Unification Grammars
Yael Sygal | Shuly Wintner
Computational Linguistics, Volume 37, Issue 1 - March 2011

pdf bib
Language Models for Machine Translation: Original vs. Translated Texts
Gennadi Lembersky | Noam Ordan | Shuly Wintner
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Identification of Multi-word Expressions by Combining Multiple Linguistic Information Sources
Yulia Tsvetkov | Shuly Wintner
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

pdf bib
Machine Translation between Hebrew and Arabic: Needs, Challenges and Preliminary Solutions
Reshef Shilon | Nizar Habash | Alon Lavie | Shuly Wintner
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Student Research Workshop

Hebrew and Arabic are related but mutually incomprehensible languages with complex morphology and scarce parallel corpora. Machine translation between the two languages is therefore interesting and challenging. We discuss similarities and differences between Hebrew and Arabic, the benefits and challenges that they induce, respectively, and their implications for machine translation. We highlight the shortcomings of using English as a pivot language and advocate a direct, transfer-based and linguistically-informed (but still statistical, and hence scalable) approach. We report preliminary results of such a system that we are currently developing.

pdf bib
A General Method for Creating a Bilingual Transliteration Dictionary
Amit Kirschenbaum | Shuly Wintner
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Transliteration is the rendering in one language of terms from another language (and, possibly, another writing system), approximating spelling and/or phonetic equivalents between the two languages. A transliteration dictionary is a crucial resource for a variety of natural language applications, most notably machine translation. We describe a general method for creating bilingual transliteration dictionaries from Wikipedia article titles. The method can be applied to any language pair with Wikipedia presence, independently of the writing systems involved, and requires only a single simple resource that can be provided by any literate bilingual speaker. It was successfully applied to extract a Hebrew-English transliteration dictionary which, when incorporated in a machine translation system, indeed improved its performance.

pdf bib
Automatic Acquisition of Parallel Corpora from Websites with Dynamic Content
Yulia Tsvetkov | Shuly Wintner
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Parallel corpora are indispensable resources for a variety of multilingual natural language processing tasks. This paper presents a technique for fully automatic construction of constantly growing parallel corpora. We propose a simple and effective dictionary-based algorithm to extract parallel document pairs from a large collection of articles retrieved from the Internet, potentially containing manually translated texts. This algorithm was implemented and tested on Hebrew-English parallel texts. With properly selected thresholds, precision of 100% can be obtained.

pdf bib
A Morphologically-Analyzed CHILDES Corpus of Hebrew
Bracha Nir | Brian MacWhinney | Shuly Wintner
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We present a corpus of transcribed spoken Hebrew that forms an integral part of a comprehensive data system that has been developed to suit the specific needs and interests of child language researchers: CHILDES (Child Language Data Exchange System). We introduce a dedicated transcription scheme for the spoken Hebrew data that is aware both of the phonology and of the standard orthography of the language. We also introduce a morphological analyzer that was specifically developed for this corpus.

pdf bib
Identifying Multi-word Expressions by Leveraging Morphological and Syntactic Idiosyncrasy
Hassan Al-Haj | Shuly Wintner
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
Extraction of Multi-word Expressions from Small Parallel Corpora
Yulia Tsvetkov | Shuly Wintner
Coling 2010: Posters

2009

pdf bib
Last Words: What Science Underlies Natural Language Engineering?
Shuly Wintner
Computational Linguistics, Volume 35, Number 4, December 2009

pdf bib
Lightly Supervised Transliteration for Machine Translation
Amit Kirschenbaum | Shuly Wintner
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

pdf bib
Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages
Mike Rosner | Shuly Wintner
Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages

2008

pdf bib
Identifying Semitic Roots: Machine Learning with Linguistic Constraints
Ezra Daya | Dan Roth | Shuly Wintner
Computational Linguistics, Volume 34, Number 3, September 2008

2007

pdf bib
High-accuracy Annotation and Parsing of CHILDES Transcripts
Kenji Sagae | Eric Davis | Alon Lavie | Brian MacWhinney | Shuly Wintner
Proceedings of the Workshop on Cognitive Aspects of Computational Language Acquisition

pdf bib
Cross Lingual and Semantic Retrieval for Cultural Heritage Appreciation
Idan Szpektor | Ido Dagan | Alon Lavie | Danny Shacham | Shuly Wintner
Proceedings of the Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2007).

pdf bib
Morphological Disambiguation of Hebrew: A Case Study in Classifier Combination
Danny Shacham | Shuly Wintner
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf bib
Finite-State Registered Automata for Non-Concatenative Morphology
Yael Cohen-Sygal | Shuly Wintner
Computational Linguistics, Volume 32, Number 1, March 2006

pdf bib
Partially Specified Signatures: A Vehicle for Grammar Modularity
Yael Cohen-Sygal | Shuly Wintner
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Highly Constrained Unification Grammars
Daniel Feinstein | Shuly Wintner
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
A Computational Lexicon of Contemporary Hebrew
Alon Itai | Shuly Wintner | Shlomo Yona
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Computational lexicons are among the most important resources for natural language processing (NLP). Their importance is even greater in languages with rich morphology, where the lexicon is expected to provide morphological analyzers with enough information to enable themto correctly process intricately inflected forms. We describe the Haifa Lexicon of Contemporary Hebrew, the broadest-coverage publicly available lexicon of Modern Hebrew, currently consisting of over 20,000 entries.While other lexical resources of Modern Hebrew have been developed in the past, this is the first publicly available large-scale lexicon of the language. In addition to supporting morphological processors (analyzers and generators), which was our primary objective, thelexicon is used as a research tool in Hebrew lexicography and lexical semantics. It is open for browsing on the web and several search tools and interfaces were developed which facilitate on-line access to its information. The lexicon is currently used for a variety of NLP applications.

pdf bib
11th Conference of the European Chapter of the Association for Computational Linguistics
Diana McCarthy | Shuly Wintner
11th Conference of the European Chapter of the Association for Computational Linguistics

2005

pdf bib
A Finite-State Morphological Grammar of Hebrew
Shlomo Yona | Shuly Wintner
Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages

pdf bib
XFST2FSA: Comparing Two Finite-State Toolboxes
Yael Cohen-Sygal | Shuly Wintner
Proceedings of Workshop on Software

2004

pdf bib
Learning Hebrew Roots: Machine Learning with Linguistic Constraints
Ezra Daya | Dan Roth | Shuly Wintner
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

pdf bib
Rapid prototyping of a transfer-based Hebrew-to-English machine translation system
Alon Lavie | Erik Peterson | Katharina Probst | Shuly Wintner | Yaniv Eytani
Proceedings of the 10th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages

2003

pdf bib
Finite state technology and its applications to machine translation
Shuly Wintner
Proceedings of Machine Translation Summit IX: Tutorials

pdf bib
Resources for processing Israeli Hebrew
Shuly Wintner | Shlomo Yona
Workshop on Machine Translation for Semitic languages: issues and approaches

We describe work in progress whose main objective is to create a collection of resources and tools for processing Hebrew. These resources include corpora of written texts, some of them annotated in various degrees of detail; tools for collecting, expanding and maintaining corpora; tools for annotation; lexicons, both monolingual and bilingual; a rule-based, linguistically motivated morphological analyzer and generator; and a WordNet for Hebrew. We emphasize the methodological issue of well-defined standards for the resources to be developed. The design of the resources guarantees their reusability, such that the output of one system can naturally be the input to another.

2002

pdf bib
Formal Language Theory for Natural Language Processing
Shuly Wintner
Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics

pdf bib
Squibs and Discussions: A Note on Typing Feature Structures
Shuly Wintner | Anoop Sarkar
Computational Linguistics, Volume 28, Number 3, September 2002

pdf bib
Guaranteeing Parsing Termination of Unification Grammars
Efrat Jaeger | Nissim Francez | Shuly Wintner
COLING 2002: The 19th International Conference on Computational Linguistics

1999

pdf bib
Compositional Semantics for Linguistic Formalisms
Shuly Wintner
Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics

1998

pdf bib
Towards a linguistically motivated computational grammar for Hebrew
Shuly Wintner
Computational Approaches to Semitic Languages

pdf bib
System Demonstration Natural Language Generation With Abstract Machine
Evgeniy Gabrilovich | Nissirn Francez | Shuly Wintner
Natural Language Generation

1995

pdf bib
Parsing with Typed Feature Structures
Shuly Wintner | Nissim Francez
Proceedings of the Fourth International Workshop on Parsing Technologies

In this paper we provide for parsing with respect to grammars expressed in a general TFS-based formalism, a restriction of ALE ([2]). Our motivation being the design of an abstract (WAM-like) machine for the formalism ([14]), we consider parsing as a computational process and use it as an operational semantics to guide the design of the control structures for the abstract machine. We emphasize the notion of abstract typed feature structures (AFSs) that encode the essential information of TFSs and define unification over AFSs rather than over TFSs. We then introduce an explicit construct of multi-rooted feature structures (MRSs) that naturally extend TFSs and use them to represent phrasal signs as well as grammar rules. We also employ abstractions of MRSs and give the mathematical foundations needed for manipulating them. We then present a simple bottom-up chart parser as a model for computation: grammars written in the TFS-based formalism are executed by the parser. Finally, we show that the parser is correct.