Serge Sharoff


2021

pdf bib
Automatic Difficulty Classification of Arabic Sentences
Nouran Khallaf | Serge Sharoff
Proceedings of the Sixth Arabic Natural Language Processing Workshop

In this paper, we present a Modern Standard Arabic (MSA) Sentence difficulty classifier, which predicts the difficulty of sentences for language learners using either the CEFR proficiency levels or the binary classification as simple or complex. We compare the use of sentence embeddings of different kinds (fastText, mBERT , XLM-R and Arabic-BERT), as well as traditional language features such as POS tags, dependency trees, readability scores and frequency lists for language learners. Our best results have been achieved using fined-tuned Arabic-BERT. The accuracy of our 3-way CEFR classification is F-1 of 0.80 and 0.75 for Arabic-Bert and XLM-R classification respectively and 0.71 Spearman correlation for regression. Our binary difficulty classifier reaches F-1 0.94 and F-1 0.98 for sentence-pair semantic similarity classifier.

2020

pdf bib
Sentence Level Human Translation Quality Estimation with Attention-based Neural Networks
Yu Yuan | Serge Sharoff
Proceedings of the 12th Language Resources and Evaluation Conference

This paper explores the use of Deep Learning methods for automatic estimation of quality of human translations. Automatic estimation can provide useful feedback for translation teaching, examination and quality control. Conventional methods for solving this task rely on manually engineered features and external knowledge. This paper presents an end-to-end neural model without feature engineering, incorporating a cross attention mechanism to detect which parts in sentence pairs are most relevant for assessing quality. Another contribution concerns oprediction of fine-grained scores for measuring different aspects of translation quality, such as terminological accuracy or idiomatic writing. Empirical results on a large human annotated dataset show that the neural model outperforms feature-based methods significantly. The dataset and the tools are available.

pdf bib
Know thy Corpus! Robust Methods for Digital Curation of Web corpora
Serge Sharoff
Proceedings of the 12th Language Resources and Evaluation Conference

This paper proposes a novel framework for digital curation of Web corpora in order to provide robust estimation of their parameters, such as their composition and the lexicon. In recent years language models pre-trained on large corpora emerged as clear winners in numerous NLP tasks, but no proper analysis of the corpora which led to their success has been conducted. The paper presents a procedure for robust frequency estimation, which helps in establishing the core lexicon for a given corpus, as well as a procedure for estimating the corpus composition via unsupervised topic models and via supervised genre classification of Web pages. The results of the digital curation study applied to several Web-derived corpora demonstrate their considerable differences. First, this concerns different frequency bursts which impact the core lexicon obtained from each corpus. Second, this concerns the kinds of texts they contain. For example, OpenWebText contains considerably more topical news and political argumentation in comparison to ukWac or Wikipedia. The tools and the results of analysis have been released.

pdf bib
Recognizing Semantic Relations by Combining Transformers and Fully Connected Models
Dmitri Roussinov | Serge Sharoff | Nadezhda Puchnina
Proceedings of the 12th Language Resources and Evaluation Conference

Automatically recognizing an existing semantic relation (e.g. “is a”, “part of”, “property of”, “opposite of” etc.) between two words (phrases, concepts, etc.) is an important task affecting many NLP applications and has been subject of extensive experimentation and modeling. Current approaches to automatically telling if a relation exists between two given concepts X and Y can be grouped into two types: 1) those modeling word-paths connecting X and Y in text and 2) those modeling distributional properties of X and Y separately, not necessary in the proximity to each other. Here, we investigate how both types can be improved and combined. We suggest a distributional approach that is based on an attention-based transformer. We have also developed a novel word path model that combines useful properties of a convolutional network with a fully connected language model. While our transformer-based approach works better, both our models significantly outperform the state-of-the-art within their classes of approaches. We also demonstrate that combining the two approaches results in additional gains since they use somewhat different data sources.

pdf bib
Proceedings of the 13th Workshop on Building and Using Comparable Corpora
Reinhard Rapp | Pierre Zweigenbaum | Serge Sharoff
Proceedings of the 13th Workshop on Building and Using Comparable Corpora

pdf bib
Overview of the Fourth BUCC Shared Task: Bilingual Dictionary Induction from Comparable Corpora
Reinhard Rapp | Pierre Zweigenbaum | Serge Sharoff
Proceedings of the 13th Workshop on Building and Using Comparable Corpora

The shared task of the 13th Workshop on Building and Using Comparable Corpora was devoted to the induction of bilingual dictionaries from comparable rather than parallel corpora. In this task, for a number of language pairs involving Chinese, English, French, German, Russian and Spanish, the participants were supposed to determine automatically the target language translations of several thousand source language test words of three frequency ranges. We describe here some background, the task definition, the training and test data sets and the evaluation used for ranking the participating systems. We also summarize the approaches used and present the results of the evaluation. In conclusion, the outcome of the competition are the results of a number of systems which provide surprisingly good solutions to the ambitious problem.

2019

pdf bib
Towards Functionally Similar Corpus Resources for Translation
Maria Kunilovskaya | Serge Sharoff
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

The paper describes a computational approach to produce functionally comparable monolingual corpus resources for translation studies and contrastive analysis. We exploit a text-external approach, based on a set of Functional Text Dimensions to model text functions, so that each text can be represented as a vector in a multidimensional space of text functions. These vectors can be used to find reasonably homogeneous subsets of functionally similar texts across different corpora. Our models for predicting text functions are based on recurrent neural networks and traditional feature-based machine learning approaches. In addition to using the categories of the British National Corpus as our test case, we investigated the functional comparability of the English parts from the two parallel corpora: CroCo (English-German) and RusLTC (English-Russian) and applied our models to define functionally similar clusters in them. Our results show that the Functional Text Dimensions provide a useful description for text categories, while allowing a more flexible representation for texts with hybrid functions.

2018

pdf bib
Language adaptation experiments via cross-lingual embeddings for related languages
Serge Sharoff
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Investigating the Influence of Bilingual MWU on Trainee Translation Quality
Yu Yuan | Serge Sharoff
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Cross-lingual Terminology Extraction for Translation Quality Estimation
Yu Yuan | Yuze Gao | Yue Zhang | Serge Sharoff
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
A Multilingual Dataset for Evaluating Parallel Sentence Extraction from Comparable Corpora
Pierre Zweigenbaum | Serge Sharoff | Reinhard Rapp
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Toward Pan-Slavic NLP: Some Experiments with Language Adaptation
Serge Sharoff
Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing

There is great variation in the amount of NLP resources available for Slavonic languages. For example, the Universal Dependency treebank (Nivre et al., 2016) has about 2 MW of training resources for Czech, more than 1 MW for Russian, while only 950 words for Ukrainian and nothing for Belorussian, Bosnian or Macedonian. Similarly, the Autodesk Machine Translation dataset only covers three Slavonic languages (Czech, Polish and Russian). In this talk I will discuss a general approach, which can be called Language Adaptation, similarly to Domain Adaptation. In this approach, a model for a particular language processing task is built by lexical transfer of cognate words and by learning a new feature representation for a lesser-resourced (recipient) language starting from a better-resourced (donor) language. More specifically, I will demonstrate how language adaptation works in such training scenarios as Translation Quality Estimation, Part-of-Speech tagging and Named Entity Recognition.

pdf bib
Proceedings of the 10th Workshop on Building and Using Comparable Corpora
Serge Sharoff | Pierre Zweigenbaum | Reinhard Rapp
Proceedings of the 10th Workshop on Building and Using Comparable Corpora

pdf bib
Overview of the Second BUCC Shared Task: Spotting Parallel Sentences in Comparable Corpora
Pierre Zweigenbaum | Serge Sharoff | Reinhard Rapp
Proceedings of the 10th Workshop on Building and Using Comparable Corpora

This paper presents the BUCC 2017 shared task on parallel sentence extraction from comparable corpora. It recalls the design of the datasets, presents their final construction and statistics and the methods used to evaluate system results. 13 runs were submitted to the shared task by 4 teams, covering three of the four proposed language pairs: French-English (7 runs), German-English (3 runs), and Chinese-English (3 runs). The best F-scores as measured against the gold standard were 0.84 (German-English), 0.80 (French-English), and 0.43 (Chinese-English). Because of the design of the dataset, in which not all gold parallel sentence pairs are known, these are only minimum values. We examined manually a small sample of the false negative sentence pairs for the most precise French-English runs and estimated the number of parallel sentence pairs not yet in the provided gold standard. Adding them to the gold standard leads to revised estimates for the French-English F-scores of at most +1.5pt. This suggests that the BUCC 2017 datasets provide a reasonable approximate evaluation of the parallel sentence spotting task.

2016

pdf bib
MoBiL: A Hybrid Feature Set for Automatic Human Translation Quality Assessment
Yu Yuan | Serge Sharoff | Bogdan Babych
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper we introduce MoBiL, a hybrid Monolingual, Bilingual and Language modelling feature set and feature selection and evaluation framework. The set includes translation quality indicators that can be utilized to automatically predict the quality of human translations in terms of content adequacy and language fluency. We compare MoBiL with the QuEst baseline set by using them in classifiers trained with support vector machine and relevance vector machine learning algorithms on the same data set. We also report an experiment on feature selection to opt for fewer but more informative features from MoBiL. Our experiments show that classifiers trained on our feature set perform consistently better in predicting both adequacy and fluency than the classifiers trained on the baseline feature set. MoBiL also performs well when used with both support vector machine and relevance vector machine algorithms.

pdf bib
Genre classification for a corpus of academic webpages
Erika Dalan | Serge Sharoff
Proceedings of the 10th Web as Corpus Workshop

2015

pdf bib
Proceedings of the Eighth Workshop on Building and Using Comparable Corpora
Pierre Zweigenbaum | Serge Sharoff | Reinhard Rapp
Proceedings of the Eighth Workshop on Building and Using Comparable Corpora

pdf bib
Obtaining SMT dictionaries for related languages
Miguel Rios | Serge Sharoff
Proceedings of the Eighth Workshop on Building and Using Comparable Corpora

pdf bib
BUCC Shared Task: Cross-Language Document Similarity
Serge Sharoff | Pierre Zweigenbaum | Reinhard Rapp
Proceedings of the Eighth Workshop on Building and Using Comparable Corpora

pdf bib
Applying Multi-Dimensional Analysis to a Russian Webcorpus: Searching for Evidence of Genres
Anisya Katinskaya | Serge Sharoff
The 5th Workshop on Balto-Slavic Natural Language Processing

pdf bib
Large Scale Translation Quality Estimation
Miguel Angel Rios Gaona | Serge Sharoff
Proceedings of the 1st Deep Machine Translation Workshop

pdf bib
Book Reviews: Web Corpus Construction by Roland Schäfer and Felix Bildhauer
Serge Sharoff
Computational Linguistics, Volume 41, Issue 1 - March 2015

2014

pdf bib
Extracting Multiword Translations from Aligned Comparable Documents
Reinhard Rapp | Serge Sharoff
Proceedings of the 3rd Workshop on Hybrid Approaches to Machine Translation (HyTra)

pdf bib
Semi-supervised Graph-based Genre Classification for Web Pages
Noushin Rezapour Asheghi | Katja Markert | Serge Sharoff
Proceedings of TextGraphs-9: the workshop on Graph-based Methods for Natural Language Processing

pdf bib
Evaluating Term Extraction Methods for Interpreters
Ran Xu | Serge Sharoff
Proceedings of the 4th International Workshop on Computational Terminology (Computerm)

pdf bib
Multiple views as aid to linguistic annotation error analysis
Marilena Di Bari | Serge Sharoff | Martin Thomas
Proceedings of LAW VIII - The 8th Linguistic Annotation Workshop

pdf bib
Designing and Evaluating a Reliable Corpus of Web Genres via Crowd-Sourcing
Noushin Rezapour Asheghi | Serge Sharoff | Katja Markert
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Research in Natural Language Processing often relies on a large collection of manually annotated documents. However, currently there is no reliable genre-annotated corpus of web pages to be employed in Automatic Genre Identification (AGI). In AGI, documents are classified based on their genres rather than their topics or subjects. The major shortcoming of available web genre collections is their relatively low inter-coder agreement. Reliability of annotated data is an essential factor for reliability of the research result. In this paper, we present the first web genre corpus which is reliably annotated. We developed precise and consistent annotation guidelines which consist of well-defined and well-recognized categories. For annotating the corpus, we used crowd-sourcing which is a novel approach in genre annotation. We computed the overall as well as the individual categories’ chance-corrected inter-annotator agreement. The results show that the corpus has been annotated reliably.

2013

pdf bib
Proceedings of the Sixth Workshop on Building and Using Comparable Corpora
Serge Sharoff | Pierre Zweigenbaum | Reinhard Rapp
Proceedings of the Sixth Workshop on Building and Using Comparable Corpora

pdf bib
English-to-Russian MT evaluation campaign
Pavel Braslavski | Alexander Beloborodov | Maxim Khalilov | Serge Sharoff
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf bib
Design of a hybrid high quality machine translation system
Bogdan Babych | Kurt Eberle | Johanna Geiß | Mireia Ginestí-Rosell | Anthony Hartley | Reinhard Rapp | Serge Sharoff | Martin Thomas
Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)

pdf bib
Identifying Word Translations from Comparable Documents Without a Seed Lexicon
Reinhard Rapp | Serge Sharoff | Bogdan Babych
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The extraction of dictionaries from parallel text corpora is an established technique. However, as parallel corpora are a scarce resource, in recent years the extraction of dictionaries using comparable corpora has obtained increasing attention. In order to find a mapping between languages, almost all approaches suggested in the literature rely on a seed lexicon. The work described here achieves competitive results without requiring such a seed lexicon. Instead it presupposes mappings between comparable documents in different languages. For some common types of textual resources (e.g. encyclopedias or newspaper texts) such mappings are either readily available or can be established relatively easily. The current work is based on Wikipedias where the mappings between languages are determined by the authors of the articles. We describe a neural-network inspired algorithm which first characterizes each Wikipedia article by a number of keywords, and then considers the identification of word translations as a variant of word alignment in a noisy environment. We present results and evaluations for eight language pairs involving Germanic, Romanic, and Slavic languages as well as Chinese.

pdf bib
Beyond translation memories: finding similar documents in comparable corpora
Serge Sharoff
Proceedings of Translating and the Computer 34

2011

pdf bib
Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Pierre Zweigenbaum | Reinhard Rapp | Serge Sharoff
Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web

pdf bib
Cross Language POS Taggers (and other Tools) for Indian Languages: An Experiment with Kannada using Telugu Resources
Siva Reddy | Serge Sharoff
Proceedings of the Fifth International Workshop On Cross Lingual Information Access

2010

pdf bib
Advanced Corpus Solutions for Humanities Researchers
James Wilson | Anthony Hartley | Serge Sharoff | Paul Stephenson
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation

pdf bib
Fine-Grained Genre Classification Using Structural Learning Algorithms
Zhili Wu | Katja Markert | Serge Sharoff
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
The Web Library of Babel: evaluating genre collections
Serge Sharoff | Zhili Wu | Katja Markert
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We present experiments in automatic genre classification on web corpora, comparing a wide variety of features on several different genreannotated datasets (HGC, I-EN, KI-04, KRYS-I, MGC and SANTINIS).We investigate the performance of several types of features (POS n-grams, character n-grams and word n-grams) and show that simple character n-grams perform best on current collections because of their ability to generalise both lexical and syntactic phenomena related to genres. However, we also show that these impressive results might not be transferrable to the wider web due to the lack of comparability between different annotation labels (many webpages cannot be described in terms of the genre labels in individual collections), lack of representativeness of existing collections (many genres are represented by webpages coming from a small number of sources) as well as problems in the reliability of genre annotation (many pages from the web are difficult to interpret in terms of the labels available). This suggests that more research is needed to understand genres on the Web.

2009

pdf bib
Evaluation-Guided Pre-Editing of Source Text: Improving MT-Tractability of Light Verb Constructions
Bogdan Babych | Anthony Hartley | Serge Sharoff
Proceedings of the 13th Annual conference of the European Association for Machine Translation

2008

pdf bib
Generalising Lexical Translation Strategies for MT Using Comparable Corpora
Bogdan Babych | Serge Sharoff | Anthony Hartley
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We report on an on-going research project aimed at increasing the range of translation equivalents which can be automatically discovered by MT systems. The methodology is based on semi-supervised learning of indirect translation strategies from large comparable corpora and applying them in run-time to generate novel, previously unseen translation equivalents. This approach is different from methods based on parallel resources, which currently can reuse only individual translation equivalents. Instead it models translation strategies which generalise individual equivalents and can successfully generate an open class of new translation solutions. The task of the project is integration of the developed technology into open-source MT systems.

pdf bib
Cleaneval: a Competition for Cleaning Web Pages
Marco Baroni | Francis Chantree | Adam Kilgarriff | Serge Sharoff
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Cleaneval is a shared task and competitive evaluation on the topic of cleaning arbitrary web pages, with the goal of preparing web data for use as a corpus for linguistic and language technology research and development. The first exercise took place in 2007. We describe how it was set up, results, and lessons learnt

pdf bib
Designing and Evaluating a Russian Tagset
Serge Sharoff | Mikhail Kopotev | Tomaž Erjavec | Anna Feldman | Dagmar Divjak
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper reports the principles behind designing a tagset to cover Russian morphosyntactic phenomena, modifications of the core tagset, and its evaluation. The tagset is based on the MULTEXT-East framework, while the decisions in designing it were aimed at achieving a balance between parameters important for linguists and the possibility to detect and disambiguate them automatically. The final tagset contains about 500 tags and achieves about 95% accuracy on the disambiguated portion of the Russian National Corpus. We have also produced a test set that can be shared with other researchers.

pdf bib
Corpus-Based Tools for Computer-Assisted Acquisition of Reading Abilities in Cognate Languages
Svitlana Kurella | Serge Sharoff | Anthony Hartley
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper presents an approach to computer-assisted teaching of reading abilities using corpus data. The approach is supported by a set of tools for automatically selecting and classifying texts retrieved from the Internet. The approach is based on a linguistic model of textual cohesion which describes relations between larger textual units that go beyond the sentence level. We show that textual connectors that link such textual units reliably predict different types of texts, such as “information” and “opinion”: using only textual connectors as features, an SVM classifier achieves an F-score of between 0.85 and 0.93 for predicting these classes. The tools are used in our project on teaching reading skills in a cognate foreign language (L3) which is cognate to a known foreign language (L2).

2007

pdf bib
A dynamic dictionary for discovering indirect translation equivalents
Bogdan Babych | Anthony Hartley | Serge Sharoff
Proceedings of Translating and the Computer 29

pdf bib
Translating from under-resourced languages: comparing direct transfer against pivot translation
Bogdan Babych | Anthony Hartley | Serge Sharoff
Proceedings of Machine Translation Summit XI: Papers

pdf bib
Assisting Translators in Indirect Lexical Transfer
Bogdan Babych | Anthony Hartley | Serge Sharoff | Olga Mudraya
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

2006

pdf bib
Using Comparable Corpora to Solve Problems Difficult for Human Translators
Serge Sharoff | Bogdan Babych | Anthony Hartley
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf bib
A Uniform Interface to Large-Scale Linguistic Resources
Serge Sharoff
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In the paper we address two practical problems concerning the use of corpora in translation studies. The first stems from the limited resources available for targeted languages and genres within languages, whereas translation researchers and students need: sufficiently large modern corpora, either reflecting general language or specific to a problem domain. The second problem concerns the lackof a uniform interface for accessing the resources, even when the yexist. We deal with the first problem by developing a framework for semi-automatic acquisition of large corpora from the Internet for the languages relevant for our research and training needs. We outline the methodology used and discuss the composition of Internet-derived corpora. We deal with the second problem by developing a uniform interface to our corpora. In addition to standard options for choosingcorpora and sorting concordance lines, the interface can compute the list of collocations and filter the results according touser-specified patterns in order to detect language-specific syntacticstructures.

pdf bib
Using collocations from comparable corpora to find translation equivalents
Serge Sharoff | Bogdan Babych | Anthony Hartley
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper we present a tool for finding appropriate translation equivalents for words from the general lexicon using comparable corpora. For a phrase in the source language the tool suggests arange of possible expressions used in similar contexts in target language corpora. In the paper we discuss the method and present results of human evaluation of the performance of the tool.

pdf bib
Using Richly Annotated Trilingual Language Resources for Acquiring Reading Skills in a Foreign Language
Dragoş Ciobanu | Tony Hartley | Serge Sharoff
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In an age when demand for innovative and motivating language teaching methodologies is at a very high level, TREAT - the Trilingual REAding Tutor - combines the most advanced natural language processing (NLP) techniques with the latest second and third language acquisition (SLA/TLA) research in an intuitive and user-friendly environment that has been proven to help adult learners (native speakers of L1) acquire reading skills in an unknown L3 which is related to (cognate with) an L2 they know to some extent. This corpus-based methodology relies on existing linguistic resources, as well as materials that are easy to assemble, and can be adapted to support other pairs of L2-L3 related languages, as well. A small evaluation study conducted at the Leeds University Centre for Translation Studies indicates that, when using TREAT, learners feel more motivated to study an unknown L3, acquire significant linguistic knowledge of both the L3 and L2 rapidly, and increase their performance when translating from L3 into L1.

pdf bib
ASSIST: Automated Semantic Assistance for Translators
Serge Sharoff | Bogdan Babych | Paul Rayson | Olga Mudraya | Scott Piao
Demonstrations

2004

pdf bib
What is at Stake: a Case Study of Russian Expressions Starting with a Preposition
Serge Sharoff
Proceedings of the Workshop on Multiword Expressions: Integrating Processing

pdf bib
Towards Basic Categories for Describing Properties of Texts in a Corpus
Serge Sharoff
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2002

pdf bib
Meaning as use: exploitation of aligned corpora for the contrastive study of lexical semantics
Serge Sharoff
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2000

pdf bib
Resources for Multilingual Text Generation in Three Slavic Languages
John Bateman | Elke Teich | Geert-Jan Kruijff | Ivana Kruijff-Korbayová | Serge Sharoff | Hana Skoumalová
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
Multilinguality in a Text Generation System For Three Slavic Languages
Geert-Jan Kruijff | Elke Teich | John Bateman | Ivana Kruijff-Korbayova | Hana Skoumalova | Serge Sharoff | Lena Sokolova | Tony Hartley | Kamenka Staykova | Jiri Hana
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics