Noam Ordan


2022

pdf bib
A Second Wave of UD Hebrew Treebanking and Cross-Domain Parsing
Amir Zeldes | Nick Howell | Noam Ordan | Yifat Ben Moshe
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Foundational Hebrew NLP tasks such as segmentation, tagging and parsing, have relied to date on various versions of the Hebrew Treebank (HTB, Sima’an et al. 2001). However, the data in HTB, a single-source newswire corpus, is now over 30 years old, and does not cover many aspects of contemporary Hebrew on the web. This paper presents a new, freely available UD treebank of Hebrew stratified from a range of topics selected from Hebrew Wikipedia. In addition to introducing the corpus and evaluating the quality of its annotations, we deploy automatic validation tools based on grew (Guillaume, 2021), and conduct the first cross domain parsing experiments in Hebrew. We obtain new state-of-the-art (SOTA) results on UD NLP tasks, using a combination of the latest language modelling and some incremental improvements to existing transformer based approaches. We also release a new version of the UD HTB matching annotation scheme updates from our new corpus.

2017

pdf bib
Found in Translation: Reconstructing Phylogenetic Language Trees from Translations
Ella Rabinovich | Noam Ordan | Shuly Wintner
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Translation has played an important role in trade, law, commerce, politics, and literature for thousands of years. Translators have always tried to be invisible; ideal translations should look as if they were written originally in the target language. We show that traces of the source language remain in the translation product to the extent that it is possible to uncover the history of the source language by looking only at the translation. Specifically, we automatically reconstruct phylogenetic language trees from monolingual texts (translated from several source languages). The signal of the source language is so powerful that it is retained even after two phases of translation. This strongly indicates that source language interference is the most dominant characteristic of translated texts, overshadowing the more subtle signals of universal properties of translation.

2016

pdf bib
On the Similarities Between Native, Non-native and Translated Texts
Ella Rabinovich | Sergiu Nisioi | Noam Ordan | Shuly Wintner
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

pdf bib
USAAR-CHRONOS: Crawling the Web for Temporal Annotations
Liling Tan | Noam Ordan
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
Statistical Machine Translation with Automatic Identification of Translationese
Naama Twitto | Noam Ordan | Shuly Wintner
Proceedings of the Tenth Workshop on Statistical Machine Translation

2014

pdf bib
Data Mining with Shallow vs. Linguistic Features to Study Diversification of Scientific Registers
Stefania Degaetano-Ortlieb | Peter Fankhauser | Hannah Kermes | Ekaterina Lapshinova-Koltunski | Noam Ordan | Elke Teich
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present a methodology to analyze the linguistic evolution of scientific registers with data mining techniques, comparing the insights gained from shallow vs. linguistic features. The focus is on selected scientific disciplines at the boundaries to computer science (computational linguistics, bioinformatics, digital construction, microelectronics). The data basis is the English Scientific Text Corpus (SCITEX) which covers a time range of roughly thirty years (1970/80s to early 2000s) (Degaetano-Ortlieb et al., 2013; Teich and Fankhauser, 2010). In particular, we investigate the diversification of scientific registers over time. Our theoretical basis is Systemic Functional Linguistics (SFL) and its specific incarnation of register theory (Halliday and Hasan, 1985). In terms of methods, we combine corpus-based methods of feature extraction and data mining techniques.

2013

pdf bib
Improving Statistical Machine Translation by Adapting Translation Models to Translationese
Gennadi Lembersky | Noam Ordan | Shuly Wintner
Computational Linguistics, Volume 39, Issue 4 - December 2013

pdf bib
Identifying the L1 of non-native writers: the CMU-Haifa system
Yulia Tsvetkov | Naama Twitto | Nathan Schneider | Noam Ordan | Manaal Faruqui | Victor Chahuneau | Shuly Wintner | Chris Dyer
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications

2012

pdf bib
Language Models for Machine Translation: Original vs. Translated Texts
Gennadi Lembersky | Noam Ordan | Shuly Wintner
Computational Linguistics, Volume 38, Issue 4 - December 2012

pdf bib
Adapting Translation Models to Translationese Improves SMT
Gennadi Lembersky | Noam Ordan | Shuly Wintner
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

2011

pdf bib
Translationese and Its Dialects
Moshe Koppel | Noam Ordan
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Language Models for Machine Translation: Original vs. Translated Texts
Gennadi Lembersky | Noam Ordan | Shuly Wintner
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing