Anna Rogers - ACL Anthology

Anna Rogers

2025

DECAF: A Dynamically Extensible Corpus Analysis Framework
Max Müller-Eberstein | Rob Van Der Goot | Anna Rogers
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

The study of generalization in Language Models (LMs) requires controlled experiments that can precisely measure complex linguistic variations between training and testing datasets. We introduce DECAF, a framework that enables the analysis and filtering of linguistically-annotated datasets down to the character level. Rather than creating new resources for each experiment, DECAF starts from datasets with existing linguistic annotations, and leverages them to analyze, filter, and generate highly controlled and reproducible experimental settings targeting specific research questions. We demonstrate DECAF’s functionality by adding 28 morphosyntactic annotation layers to the 115M-word BabyLM corpus and indexing the resulting 1.1B annotations to analyze its internal domain variance, and to create a controlled training data curriculum for a small-scale gender bias study. We release DECAF as an open-source Python library, along with the parsed and indexed version of BabyLM, as resources for future generalization research.

Do Syntactic Categories Help in Developmentally Motivated Curriculum Learning for Language Models?
Arzu Burcu Güven | Anna Rogers | Rob Van Der Goot
Proceedings of the First BabyLM Workshop

We examine the syntactic properties of BabyLM corpus, and age-groups within CHILDES. While we find that CHILDES does not exhibit strong syntactic differentiation by age, we show that the syntactic knowledge about the training data can be helpful in interpreting model performance on linguistic tasks. For curriculum learning, we explore developmental and several alternative cognitively inspired curriculum approaches. We find that some curricula help with reading tasks, but the main performance improvement come from using the subset of syntactically categorizable data, rather than the full noisy corpus.

Examining the Faithfulness of Deepseek R1’s Chain-of-Thought Reasoning
Chrisanna Cornish | Anna Rogers
Proceedings of the 1st Workshop on Confabulation, Hallucinations and Overgeneration in Multilingual and Practical Settings (CHOMPS 2025)

Chain-of-Thought (CoT) ‘reasoning’ promises to enhance the performance and transparency of Large Language Models (LLMs). Models, such as Deepseek R1, are trained via reinforcement learning to automatically generate CoT explanations in their outputs. Their faithfulness, i.e. how well the explanations actually reflect their internal reasoning process, has been called into doubt by recent studies (Chen et al., 2025a; Chua and Evans, 2025). This paper extends previous work by probing Deepseek R1 with 445 logical puzzles under zero- and few-shot settings. We find that whilst the model explicitly acknowledges a strong harmful hint in 94.6% of cases, it reports less than 2% of helpful hints. Further analysis reveals implicit unfaithfulness as the model significantly reduces answer-rechecking behaviour for helpful hints (p<0.01) despite rarely mentioning them in its CoT, demonstrating a discrepancy between its reported and actual decision process. In line with prior reports for GPT, Claude, Gemini and other models, our results for DeepSeek raise concerns about the use of CoT as an explainability technique.

Research Community Perspectives on “Intelligence” and Large Language Models
Bertram Højer | Terne Sasha Thorn Jakobsen | Anna Rogers | Stefan Heinrich
Findings of the Association for Computational Linguistics: ACL 2025

Despite the widespread use of ‘artificial intelligence’ (AI) framing in Natural Language Processing (NLP) research, it is not clear what researchers mean by ”intelligence”. To that end, we present the results of a survey on the notion of ”intelligence” among researchers and its role in the research agenda. The survey elicited complete responses from 303 researchers from a variety of fields including NLP, Machine Learning (ML), Cognitive Science, Linguistics, and Neuroscience.We identify 3 criteria of intelligence that the community agrees on the most: generalization, adaptability, & reasoning.Our results suggests that the perception of the current NLP systems as ”intelligent” is a minority position (29%).Furthermore, only 16.2% of the respondents see developing intelligent systems as a research goal, and these respondents are more likely to consider the current systems intelligent.

Code Like Humans: A Multi-Agent Solution for Medical Coding
Andreas Geert Motzfeldt | Joakim Edin | Casper L. Christensen | Christian Hardmeier | Lars Maaløe | Anna Rogers
Findings of the Association for Computational Linguistics: EMNLP 2025

In medical coding, experts map unstructured clinical notes to alphanumeric codes for diagnoses and procedures. We introduce ‘Code Like Humans’: a new agentic framework for medical coding with large language models. It implements official coding guidelines for human experts, and it is the first solution that can support the full ICD-10 coding system (+70K labels). It achieves the best performance to date on rare diagnosis codes. Fine-tuned discriminative classifiers retain an advantage for high-frequency codes, to which they are limited. Towards future work, we also contribute an analysis of system performance and identify its ‘blind spots’ (codes that are systematically undercoded).

2024

AI ‘News’ Content Farms Are Easy to Make and Hard to Detect: A Case Study in Italian
Giovanni Puccetti | Anna Rogers | Chiara Alzetta | Felice Dell’Orletta | Andrea Esuli
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large Language Models (LLMs) are increasingly used as ‘content farm’ models (CFMs), to generate synthetic text that could pass for real news articles. This is already happening even for languages that do not have high-quality monolingual LLMs. We show that fine-tuning Llama (v1), mostly trained on English, on as little as 40K Italian news articles, is sufficient for producing news-like texts that native speakers of Italian struggle to identify as synthetic.We investigate three LLMs and three methods of detecting synthetic texts (log-likelihood, DetectGPT, and supervised classification), finding that they all perform better than human raters, but they are all impractical in the real world (requiring either access to token likelihood information or a large dataset of CFM texts). We also explore the possibility of creating a proxy CFM: an LLM fine-tuned on a similar dataset to one used by the real ‘content farm’. We find that even a small amount of fine-tuning data suffices for creating a successful detector, but we need to know which base LLM is used, which is a major challenge.Our results suggest that there are currently no practical methods for detecting synthetic news-like texts ‘in the wild’, while generating them is too easy. We highlight the urgency of more NLP research on this problem.

Proceedings of the Fifth Workshop on Insights from Negative Results in NLP
Shabnam Tafreshi | Arjun Akula | João Sedoc | Aleksandr Drozd | Anna Rogers | Anna Rumshisky
Proceedings of the Fifth Workshop on Insights from Negative Results in NLP

NarrativeTime: Dense Temporal Annotation on a Timeline
Anna Rogers | Marzena Karpinska | Ankita Gupta | Vladislav Lialin | Gregory Smelkov | Anna Rumshisky
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

For the past decade, temporal annotation has been sparse: only a small portion of event pairs in a text was annotated. We present NarrativeTime, the first timeline-based annotation framework that achieves full coverage of all possible TLINKs. To compare with the previous SOTA in dense temporal annotation, we perform full re-annotation of the classic TimeBankDense corpus (American English), which shows comparable agreement with a signigicant increase in density. We contribute TimeBankNT corpus (with each text fully annotated by two expert annotators), extensive annotation guidelines, open-source tools for annotation and conversion to TimeML format, and baseline results.

2023

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Anna Rogers | Jordan Boyd-Graber | Naoaki Okazaki
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Program Chairs’ Report on Peer Review at ACL 2023
Anna Rogers | Marzena Karpinska | Jordan Boyd-Graber | Naoaki Okazaki
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We present a summary of the efforts to improve conference peer review that were implemented at ACL’23. This includes work with the goal of improving review quality, clearer workflow and decision support for the area chairs, as well as our efforts to improve paper-reviewer matching for various kinds of non- mainstream NLP work, and improve the overall incentives for all participants of the peer review process. We present analysis of the factors affecting peer review, identify the most problematic issues that the authors complained about, and provide suggestions for the future chairs. We hope that publishing such reports would (a) improve transparency in decision-making, (b) help the people new to the field to understand how the *ACL conferences work, (c) provide useful data for the future chairs and workshop organizers, and also academic work on peer review, and (d) provide useful context for the final program, as a source of information for meta-research on the structure and trajectory of the field of NLP.

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Anna Rogers | Jordan Boyd-Graber | Naoaki Okazaki
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

The ROOTS Search Tool: Data Transparency for LLMs
Aleksandra Piktus | Christopher Akiki | Paulo Villegas | Hugo Laurençon | Gérard Dupont | Sasha Luccioni | Yacine Jernite | Anna Rogers
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

ROOTS is a 1.6TB multilingual text corpus developed for the training of BLOOM, currently the largest language model explicitly accompanied by commensurate data governance efforts. In continuation of these efforts, we present the ROOTS Search Tool: a search engine over the entire ROOTS corpus offering both fuzzy and exact search capabilities. ROOTS is the largest corpus to date that can be investigated this way. The ROOTS Search Tool is open-sourced and available on Hugging Face Spaces: https://huggingface.co/spaces/bigscience-data/roots-search. We describe our implementation and the possible use cases of our tool.

Findings of the Association for Computational Linguistics: ACL 2023
Anna Rogers | Jordan Boyd-Graber | Naoaki Okazaki
Findings of the Association for Computational Linguistics: ACL 2023

Proceedings of the Fourth Workshop on Insights from Negative Results in NLP
Shabnam Tafreshi | Arjun Akula | João Sedoc | Aleksandr Drozd | Anna Rogers | Anna Rumshisky
Proceedings of the Fourth Workshop on Insights from Negative Results in NLP

Proceedings of the 8th Workshop on Representation Learning for NLP (RepL4NLP 2023)
Burcu Can | Maximilian Mozes | Samuel Cahyawijaya | Naomi Saphra | Nora Kassner | Shauli Ravfogel | Abhilasha Ravichander | Chen Zhao | Isabelle Augenstein | Anna Rogers | Kyunghyun Cho | Edward Grefenstette | Lena Voita
Proceedings of the 8th Workshop on Representation Learning for NLP (RepL4NLP 2023)

2022

Machine Reading, Fast and Slow: When Do Models “Understand” Language?
Sagnik Ray Choudhury | Anna Rogers | Isabelle Augenstein
Proceedings of the 29th International Conference on Computational Linguistics

Two of the most fundamental issues in Natural Language Understanding (NLU) at present are: (a) how it can established whether deep learning-based models score highly on NLU benchmarks for the ”right” reasons; and (b) what those reasons would even be. We investigate the behavior of reading comprehension models with respect to two linguistic ”skills”: coreference resolution and comparison. We propose a definition for the reasoning steps expected from a system that would be ”reading slowly”, and compare that with the behavior of five models of the BERT family of various sizes, observed through saliency scores and counterfactual explanations. We find that for comparison (but not coreference) the systems based on larger encoders are more likely to rely on the ”right” information, but even they struggle with generalization, suggesting that they still learn specific lexical patterns rather than the general principles of comparison.

Outlier Dimensions that Disrupt Transformers are Driven by Frequency
Giovanni Puccetti | Anna Rogers | Aleksandr Drozd | Felice Dell’Orletta
Findings of the Association for Computational Linguistics: EMNLP 2022

While Transformer-based language models are generally very robust to pruning, there is the recently discovered outlier phenomenon: disabling only 48 out of 110M parameters in BERT-base drops its performance by nearly 30% on MNLI. We replicate the original evidence for the outlier phenomenon and we link it to the geometry of the embedding space. We find that in both BERT and RoBERTa the magnitude of hidden state coefficients corresponding to outlier dimensions correlate with the frequencies of encoded tokens in pre-training data, and they also contribute to the “vertical” self-attention pattern enabling the model to focus on the special tokens. This explains the drop in performance from disabling the outliers, and it suggests that to decrease anisotopicity in future models we need pre-training schemas that would better take into account the skewed token distributions.

Proceedings of the Third Workshop on Insights from Negative Results in NLP
Shabnam Tafreshi | João Sedoc | Anna Rogers | Aleksandr Drozd | Anna Rumshisky | Arjun Akula
Proceedings of the Third Workshop on Insights from Negative Results in NLP

What Factors Should Paper-Reviewer Assignments Rely On? Community Perspectives on Issues and Ideals in Conference Peer-Review
Terne Thorn Jakobsen | Anna Rogers
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Both scientific progress and individual researcher careers depend on the quality of peer review, which in turn depends on paper-reviewer matching. Surprisingly, this problem has been mostly approached as an automated recommendation problem rather than as a matter where different stakeholders (area chairs, reviewers, authors) have accumulated experience worth taking into account. We present the results of the first survey of the NLP community, identifying common issues and perspectives on what factors should be considered by paper-reviewer matching systems. This study contributes actionable recommendations for improving future NLP conferences, and desiderata for interpretable peer review assignments.

Proceedings of the 7th Workshop on Representation Learning for NLP
Spandana Gella | He He | Bodhisattwa Prasad Majumder | Burcu Can | Eleonora Giunchiglia | Samuel Cahyawijaya | Sewon Min | Maximilian Mozes | Xiang Lorraine Li | Isabelle Augenstein | Anna Rogers | Kyunghyun Cho | Edward Grefenstette | Laura Rimell | Chris Dyer
Proceedings of the 7th Workshop on Representation Learning for NLP

2021

Changing the World by Changing the Data
Anna Rogers
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

NLP community is currently investing a lot more research and resources into development of deep learning models than training data. While we have made a lot of progress, it is now clear that our models learn all kinds of spurious patterns, social biases, and annotation artifacts. Algorithmic solutions have so far had limited success. An alternative that is being actively discussed is more careful design of datasets so as to deliver specific signals. This position paper maps out the arguments for and against data curation, and argues that fundamentally the point is moot: curation already is and will be happening, and it is changing the world. The question is only how much thought we want to invest into that process.

Reviewing Natural Language Processing Research
Kevin Cohen | Karën Fort | Margot Mieskes | Aurélie Névéol | Anna Rogers
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Tutorial Abstracts

The reviewing procedure has been identified as one of the major issues in the current situation of the NLP field. While it is implicitly assumed that junior researcher learn reviewing during their PhD project, this might not always be the case. Additionally, with the growing NLP community and the efforts in the context of widening the NLP community, researchers joining the field might not have the opportunity to practise reviewing. This tutorial fills in this gap by providing an opportunity to learn the basics of reviewing. Also more experienced researchers might find this tutorial interesting to revise their reviewing procedure.

On the Interaction of Belief Bias and Explanations
Ana Valeria González | Anna Rogers | Anders Søgaard
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

BERT Busters: Outlier Dimensions that Disrupt Transformers
Olga Kovaleva | Saurabh Kulshreshtha | Anna Rogers | Anna Rumshisky
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

‘Just What do You Think You’re Doing, Dave?’ A Checklist for Responsible Data Use in NLP
Anna Rogers | Timothy Baldwin | Kobi Leins
Findings of the Association for Computational Linguistics: EMNLP 2021

A key part of the NLP ethics movement is responsible use of data, but exactly what that means or how it can be best achieved remain unclear. This position paper discusses the core legal and ethical principles for collection and sharing of textual data, and the tensions between them. We propose a potential checklist for responsible data (re-)use that could both standardise the peer review of conference submissions, as well as enable a more in-depth view of published research across the community. Our proposal aims to contribute to the development of a consistent standard for data (re-)use, embraced across NLP conferences.

Proceedings of the Second Workshop on Insights from Negative Results in NLP
João Sedoc | Anna Rogers | Anna Rumshisky | Shabnam Tafreshi
Proceedings of the Second Workshop on Insights from Negative Results in NLP

Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics
Prajjwal Bhargava | Aleksandr Drozd | Anna Rogers
Proceedings of the Second Workshop on Insights from Negative Results in NLP

Much of recent progress in NLU was shown to be due to models’ learning dataset-specific heuristics. We conduct a case study of generalization in NLI (from MNLI to the adversarially constructed HANS dataset) in a range of BERT-based architectures (adapters, Siamese Transformers, HEX debiasing), as well as with subsampling the data and increasing the model size. We report 2 successful and 3 unsuccessful strategies, all providing insights into how Transformer-based models learn to generalize.

Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)
Anna Rogers | Iacer Calixto | Ivan Vulić | Naomi Saphra | Nora Kassner | Oana-Maria Camburu | Trapit Bansal | Vered Shwartz
Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)

2020

A guide to the dataset explosion in QA, NLI, and commonsense reasoning
Anna Rogers | Anna Rumshisky
Proceedings of the 28th International Conference on Computational Linguistics: Tutorial Abstracts

Question answering, natural language inference and commonsense reasoning are increasingly popular as general NLP system benchmarks, driving both modeling and dataset work. Only for question answering we already have over 100 datasets, with over 40 published after 2018. However, most new datasets get “solved” soon after publication, and this is largely due not to the verbal reasoning capabilities of our models, but to annotation artifacts and shallow cues in the data that they can exploit. This tutorial aims to (1) provide an up-to-date guide to the recent datasets, (2) survey the old and new methodological issues with dataset construction, and (3) outline the existing proposals for overcoming them. The target audience is the NLP practitioners who are lost in dozens of the recent datasets, and would like to know what these datasets are actually measuring. Our overview of the problems with the current datasets and the latest tips and tricks for overcoming them will also be useful to the researchers working on future benchmarks.

When BERT Plays the Lottery, All Tickets Are Winning
Sai Prasanna | Anna Rogers | Anna Rumshisky
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Large Transformer-based models were shown to be reducible to a smaller number of self-attention heads and layers. We consider this phenomenon from the perspective of the lottery ticket hypothesis, using both structured and magnitude pruning. For fine-tuned BERT, we show that (a) it is possible to find subnetworks achieving performance that is comparable with that of the full model, and (b) similarly-sized subnetworks sampled from the rest of the model perform worse. Strikingly, with structured pruning even the worst possible subnetworks remain highly trainable, indicating that most pre-trained BERT weights are potentially useful. We also study the “good” subnetworks to see if their success can be attributed to superior linguistic knowledge, but find them unstable, and not explained by meaningful self-attention patterns.

What Can We Do to Improve Peer Review in NLP?
Anna Rogers | Isabelle Augenstein
Findings of the Association for Computational Linguistics: EMNLP 2020

Peer review is our best tool for judging the quality of conference submissions, but it is becoming increasingly spurious. We argue that a part of the problem is that the reviewers and area chairs face a poorly defined task forcing apples-to-oranges comparisons. There are several potential ways forward, but the key difficulty is creating the incentives and mechanisms for their consistent implementation in the NLP community.

Proceedings of the First Workshop on Insights from Negative Results in NLP
Anna Rogers | João Sedoc | Anna Rumshisky
Proceedings of the First Workshop on Insights from Negative Results in NLP

A Primer in BERTology: What We Know About How BERT Works
Anna Rogers | Olga Kovaleva | Anna Rumshisky
Transactions of the Association for Computational Linguistics, Volume 8

Transformer-based models have pushed state of the art in many areas of NLP, but our understanding of what is behind their success is still limited. This paper is the first survey of over 150 studies of the popular BERT model. We review the current state of knowledge about how BERT works, what kind of information it learns and how it is represented, common modifications to its training objectives and architecture, the overparameterization issue, and approaches to compression. We then outline directions for future research.

2019

Revealing the Dark Secrets of BERT
Olga Kovaleva | Alexey Romanov | Anna Rogers | Anna Rumshisky
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

BERT-based architectures currently give state-of-the-art performance on many NLP tasks, but little is known about the exact mechanisms that contribute to its success. In the current work, we focus on the interpretation of self-attention, which is one of the fundamental underlying components of BERT. Using a subset of GLUE tasks and a set of handcrafted features-of-interest, we propose the methodology and carry out a qualitative and quantitative analysis of the information encoded by the individual BERT’s heads. Our findings suggest that there is a limited set of attention patterns that are repeated across different heads, indicating the overall model overparametrization. While different heads consistently use the same attention patterns, they have varying impact on performance across different tasks. We show that manually disabling attention in certain heads leads to a performance improvement over the regular fine-tuned BERT models.

Calls to Action on Social Media: Detection, Social Impact, and Censorship Potential
Anna Rogers | Olga Kovaleva | Anna Rumshisky
Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda

Calls to action on social media are known to be effective means of mobilization in social movements, and a frequent target of censorship. We investigate the possibility of their automatic detection and their potential for predicting real-world protest events, on historical data of Bolotnaya protests in Russia (2011-2013). We find that political calls to action can be annotated and detected with relatively high accuracy, and that in our sample their volume has a moderate positive correlation with rally attendance.

Adversarial Decomposition of Text Representation
Alexey Romanov | Anna Rumshisky | Anna Rogers | David Donahue
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

In this paper, we present a method for adversarial decomposition of text representation. This method can be used to decompose a representation of an input sentence into several independent vectors, each of them responsible for a specific aspect of the input sentence. We evaluate the proposed method on two case studies: the conversion between different social registers and diachronic language change. We show that the proposed method is capable of fine-grained controlled change of these aspects of the input sentence. It is also learning a continuous (rather than categorical) representation of the style of the sentence, which is more linguistically realistic. The model uses adversarial-motivational training and includes a special motivational loss, which acts opposite to the discriminator and encourages a better decomposition. Furthermore, we evaluate the obtained meaning embeddings on a downstream task of paraphrase detection and show that they significantly outperform the embeddings of a regular autoencoder.

Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP
Anna Rogers | Aleksandr Drozd | Anna Rumshisky | Yoav Goldberg
Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP

2018

RuSentiment: An Enriched Sentiment Analysis Dataset for Social Media in Russian
Anna Rogers | Alexey Romanov | Anna Rumshisky | Svitlana Volkova | Mikhail Gronas | Alex Gribov
Proceedings of the 27th International Conference on Computational Linguistics

This paper presents RuSentiment, a new dataset for sentiment analysis of social media posts in Russian, and a new set of comprehensive annotation guidelines that are extensible to other languages. RuSentiment is currently the largest in its class for Russian, with 31,185 posts annotated with Fleiss’ kappa of 0.58 (3 annotations per post). To diversify the dataset, 6,950 posts were pre-selected with an active learning-style strategy. We report baseline classification results, and we also release the best-performing embeddings trained on 3.2B tokens of Russian VKontakte posts.

What’s in Your Embedding, And How It Predicts Task Performance
Anna Rogers | Shashwath Hosur Ananthakrishna | Anna Rumshisky
Proceedings of the 27th International Conference on Computational Linguistics

Attempts to find a single technique for general-purpose intrinsic evaluation of word embeddings have so far not been successful. We present a new approach based on scaled-up qualitative analysis of word vector neighborhoods that quantifies interpretable characteristics of a given model (e.g. its preference for synonyms or shared morphological forms as nearest neighbors). We analyze 21 such factors and show how they correlate with performance on 14 extrinsic and intrinsic task datasets (and also explain the lack of correlation between some of them). Our approach enables multi-faceted evaluation, parameter search, and generally – a more principled, hypothesis-driven approach to development of distributional semantic representations.

Subcharacter Information in Japanese Embeddings: When Is It Worth It?
Marzena Karpinska | Bofang Li | Anna Rogers | Aleksandr Drozd
Proceedings of the Workshop on the Relevance of Linguistic Structure in Neural Architectures for NLP

Languages with logographic writing systems present a difficulty for traditional character-level models. Leveraging the subcharacter information was recently shown to be beneficial for a number of intrinsic and extrinsic tasks in Chinese. We examine whether the same strategies could be applied for Japanese, and contribute a new analogy dataset for this language.

2017

Investigating Different Syntactic Context Types and Context Representations for Learning Word Embeddings
Bofang Li | Tao Liu | Zhe Zhao | Buzhou Tang | Aleksandr Drozd | Anna Rogers | Xiaoyong Du
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

The number of word embedding models is growing every year. Most of them are based on the co-occurrence information of words and their contexts. However, it is still an open question what is the best definition of context. We provide a systematical investigation of 4 different syntactic context types and context representations for learning word embeddings. Comprehensive experiments are conducted to evaluate their effectiveness on 6 extrinsic and intrinsic tasks. We hope that this paper, along with the published code, would be helpful for choosing the best context type and representation for a given task.

The (too Many) Problems of Analogical Reasoning with Word Vectors
Anna Rogers | Aleksandr Drozd | Bofang Li
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

This paper explores the possibilities of analogical reasoning with vector space models. Given two pairs of words with the same relation (e.g. man:woman :: king:queen), it was proposed that the offset between one pair of the corresponding word vectors can be used to identify the unknown member of the other pair (king - man + woman = queen). We argue against such “linguistic regularities” as a model for linguistic relations in vector space models and as a benchmark, and we show that the vector offset (as well as two other, better-performing methods) suffers from dependence on vector similarity.

Co-authors

Olga Kovaleva 4

Naoaki Okazaki 4

Shabnam Tafreshi 4

Marzena Karpinska 3

Alexey Romanov 3

Samuel Cahyawijaya 2

Kyunghyun Cho 2

Felice Dell’Orletta 2

Rob Van Der Goot 2

Edward Grefenstette 2

Maximilian Mozes 2

Giovanni Puccetti 2

Terne Sasha Thorn Jakobsen 2

Christopher Akiki 1

Chiara Alzetta 1

Timothy Baldwin 1

Trapit Bansal 1

Prajjwal Bhargava 1

Iacer Calixto 1

Oana-Maria Camburu 1

Sagnik Ray Choudhury 1

Casper L. Christensen 1

K. Bretonnel Cohen 1

Chrisanna Cornish 1

David Donahue 1

Gérard Dupont 1

Spandana Gella 1

Eleonora Giunchiglia 1

Yoav Goldberg 1

Ana Valeria González 1

Mikhail Gronas 1

Arzu Burcu Güven 1

Christian Hardmeier 1

Stefan Heinrich 1

Shashwath Hosur Ananthakrishna 1

Bertram Højer 1

Yacine Jernite 1

Saurabh Kulshreshtha 1

Hugo Laurençon 1

Xiang Lorraine Li 1

Vladislav Lialin 1

Sasha Luccioni 1

Bodhisattwa Prasad Majumder 1

Margot Mieskes 1

Andreas Geert Motzfeldt 1

Max Müller-Eberstein 1

Aurelie Neveol 1

Aleksandra Piktus 1

Shauli Ravfogel 1

Abhilasha Ravichander 1

Vered Shwartz 1

Gregory Smelkov 1

Anders Søgaard 1

Paulo Villegas 1

Svitlana Volkova 1

Venues