Kenji Sagae - ACL Anthology

Kenji Sagae

2025

The Role of Abstract Representations and Observed Preferences in the Ordering of Binomials in Large Language Models
Zachary Nicholas Houghton | Kenji Sagae | Emily Morgan
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

To what extent do large language models learn abstract representations as opposed to more superficial aspects of their very large training corpora? We examine this question in the context of binomial ordering preferences involving two conjoined nouns in English. When choosing a binomial ordering (radio and television vs television and radio), humans rely on more than simply the observed frequency of each option. Humans also rely on abstract ordering preferences (e.g., preferences for short words before long words). We investigate whether large language models simply rely on the observed preference in their training data, or whether they are capable of learning the abstract ordering preferences (i.e., abstract representations) that humans rely on. Our results suggest that both smaller and larger models’ ordering preferences are driven exclusively by their experience with that item in the training data. Our study provides further insights into differences between how large language models represent and use language and how humans do it, particularly with respect to the use of abstract representations versus observed preferences.

What data should I include in my POS tagging training set?
Zoey Liu | Masoud Jasbi | Christan Grant | Kenji Sagae | Emily Prud’hommeaux
Findings of the Association for Computational Linguistics: EMNLP 2025

Building an NLP training set for understudied languages, including Indigenous and endangered languages, often faces challenges due to varying degrees of resource limitations in the speaker communities. What are some reasonable approaches for training set construction in these cases? We address this question with POS tagging as the test case. Although many might consider POS tagging “a solved problem”, it remains a crucial task for descriptive linguistics and language documentation and requires laborious manual annotation. Drawing data from 12 language families, we compare in-context learning, active learning (AL), and random sampling. Our results suggest: (1) for communities whose language data can be ethically shared with an API, using only 1,000 randomly sampled tokens as prompt examples, the proprietary GPT-4.1-mini can deliver desirable performance (F1>0.83) on par with that from a training set of thousands of tokens in AL iterations; (2) in cases where communities prefer not to share data, 4,500-5,500 tokens selected from AL can yield reasonable results at a pace statistically significantly faster than random sampling, evidenced by growth curve modeling.

Proceedings of the 18th International Conference on Parsing Technologies (IWPT, SyntaxFest 2025)
Kenji Sagae | Stephan Oepen
Proceedings of the 18th International Conference on Parsing Technologies (IWPT, SyntaxFest 2025)

2021

Language Embeddings for Typology and Cross-lingual Transfer Learning
Dian Yu | Taiqi He | Kenji Sagae
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Cross-lingual language tasks typically require a substantial amount of annotated data or parallel translation data. We explore whether language representations that capture relationships among languages can be learned and subsequently leveraged in cross-lingual tasks without the use of parallel data. We generate dense embeddings for 29 languages using a denoising autoencoder, and evaluate the embeddings using the World Atlas of Language Structures (WALS) and two extrinsic tasks in a zero-shot setting: cross-lingual dependency parsing and cross-lingual natural language inference.

Automatically Exposing Problems with Neural Dialog Models
Dian Yu | Kenji Sagae
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Neural dialog models are known to suffer from problems such as generating unsafe and inconsistent responses. Even though these problems are crucial and prevalent, they are mostly manually identified by model designers through interactions. Recently, some research instructs crowdworkers to goad the bots into triggering such problems. However, humans leverage superficial clues such as hate speech, while leaving systematic problems undercover. In this paper, we propose two methods including reinforcement learning to automatically trigger a dialog model into generating problematic responses. We show the effect of our methods in exposing safety and contradiction issues with state-of-the-art dialog models.

Attribute Alignment: Controlling Text Generation from Pre-trained Language Models
Dian Yu | Zhou Yu | Kenji Sagae
Findings of the Association for Computational Linguistics: EMNLP 2021

Large language models benefit from training with a large amount of unlabeled text, which gives them increasingly fluent and diverse generation capabilities. However, using these models for text generation that takes into account target attributes, such as sentiment polarity or specific topics, remains a challenge. We propose a simple and flexible method for controlling text generation by aligning disentangled attribute representations. In contrast to recent efforts on training a discriminator to perturb the token level distribution for an attribute, we use the same data to learn an alignment function to guide the pre-trained, non-controlled language model to generate texts with the target attribute without changing the original language model parameters. We evaluate our method on sentiment- and topic-controlled generation, and show large performance gains over previous methods while retaining fluency and diversity.

Proceedings of the 17th International Conference on Parsing Technologies and the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies (IWPT 2021)
Stephan Oepen | Kenji Sagae | Reut Tsarfaty | Gosse Bouma | Djamé Seddah | Daniel Zeman
Proceedings of the 17th International Conference on Parsing Technologies and the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies (IWPT 2021)

2020

Tracking the Evolution of Written Language Competence in L2 Spanish Learners
Alessio Miaschi | Sam Davidson | Dominique Brunato | Felice Dell’Orletta | Kenji Sagae | Claudia Helena Sanchez-Gutierrez | Giulia Venturi
Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications

In this paper we present an NLP-based approach for tracking the evolution of written language competence in L2 Spanish learners using a wide range of linguistic features automatically extracted from students’ written productions. Beyond reporting classification results for different scenarios, we explore the connection between the most predictive features and the teaching curriculum, finding that our set of linguistic features often reflect the explicit instructions that students receive during each course.

Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies
Gosse Bouma | Yuji Matsumoto | Stephan Oepen | Kenji Sagae | Djamé Seddah | Weiwei Sun | Anders Søgaard | Reut Tsarfaty | Dan Zeman
Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies

Developing NLP Tools with a New Corpus of Learner Spanish
Sam Davidson | Aaron Yamada | Paloma Fernandez Mira | Agustina Carando | Claudia H. Sanchez Gutierrez | Kenji Sagae
Proceedings of the Twelfth Language Resources and Evaluation Conference

The development of effective NLP tools for the L2 classroom depends largely on the availability of large annotated corpora of language learner text. While annotated learner corpora of English are widely available, large learner corpora of Spanish are less common. Those Spanish corpora that are available do not contain the annotations needed to facilitate the development of tools beneficial to language learners, such as grammatical error correction. As a result, the field has seen little research in NLP tools designed to benefit Spanish language learners and teachers. We introduce COWS-L2H, a freely available corpus of Spanish learner data which includes error annotations and parallel corrected text to help researchers better understand L2 development, to examine teaching practices empirically, and to develop NLP tools to better serve the Spanish teaching community. We demonstrate the utility of this corpus by developing a neural-network based grammatical error correction system for Spanish learner writing.

2019

UC Davis at SemEval-2019 Task 1: DAG Semantic Parsing with Attention-based Decoder
Dian Yu | Kenji Sagae
Proceedings of the 13th International Workshop on Semantic Evaluation

We present an encoder-decoder model for semantic parsing with UCCA SemEval 2019 Task 1. The encoder is a Bi-LSTM and the decoder uses recursive self-attention. The proposed model alleviates challenges and feature engineering in traditional transition-based and graph-based parsers. The resulting parser is simple and proved to effective on the semantic parsing task.

2017

Proceedings of the 15th International Conference on Parsing Technologies
Yusuke Miyao | Kenji Sagae
Proceedings of the 15th International Conference on Parsing Technologies

2016

Supertagging With LSTMs
Ashish Vaswani | Yonatan Bisk | Kenji Sagae | Ryan Musa
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Efficient Structured Inference for Transition-Based Parsing with Neural Networks and Error States
Ashish Vaswani | Kenji Sagae
Transactions of the Association for Computational Linguistics, Volume 4

Transition-based approaches based on local classification are attractive for dependency parsing due to their simplicity and speed, despite producing results slightly below the state-of-the-art. In this paper, we propose a new approach for approximate structured inference for transition-based parsing that produces scores suitable for global scoring using local models. This is accomplished with the introduction of error states in local training, which add information about incorrect derivation paths typically left out completely in locally-trained models. Using neural networks for our local classifiers, our approach achieves 93.61% accuracy for transition-based dependency parsing in English.

2015

Combining Distributed Vector Representations for Words
Justin Garten | Kenji Sagae | Volkan Ustun | Morteza Dehghani
Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing

2014

Data-driven Measurement of Child Language Development with Simple Syntactic Templates
Shannon Lubetich | Kenji Sagae
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

Improving Classification-Based Natural Language Understanding with Non-Expert Annotation
Fabrizio Morbini | Eric Forbell | Kenji Sagae
Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL)

Verbal Behaviors and Persuasiveness in Online Multimedia Content
Moitreya Chatterjee | Sunghyun Park | Han Suk Shim | Kenji Sagae | Louis-Philippe Morency
Proceedings of the Second Workshop on Natural Language Processing for Social Media (SocialNLP)

2013

Roundtable: An Online Framework for Building Web-based Conversational Agents
Eric Forbell | Nicolai Kalisch | Fabrizio Morbini | Kelly Christoffersen | Kenji Sagae | David Traum | Albert A. Rizzo
Proceedings of the SIGDIAL 2013 Conference

Which ASR should I choose for my dialogue system?
Fabrizio Morbini | Kartik Audhkhasi | Kenji Sagae | Ron Artstein | Doğan Can | Panayiotis Georgiou | Shri Narayanan | Anton Leuski | David Traum
Proceedings of the SIGDIAL 2013 Conference

2012

Practical Evaluation of Human and Synthesized Speech for Virtual Human Dialogue Systems
Kallirroi Georgila | Alan Black | Kenji Sagae | David Traum
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The current practice in virtual human dialogue systems is to use professional human recordings or limited-domain speech synthesis. Both approaches lead to good performance but at a high cost. To determine the best trade-off between performance and cost, we perform a systematic evaluation of human and synthesized voices with regard to naturalness, conversational aspect, and likability. We vary the type (in-domain vs. out-of-domain), length, and content of utterances, and take into account the age and native language of raters as well as their familiarity with speech synthesis. We present detailed results from two studies, a pilot one and one run on Amazon's Mechanical Turk. Our results suggest that a professional human voice can supersede both an amateur human voice and synthesized voices. Also, a high-quality general-purpose voice or a good limited-domain voice can perform better than amateur human recordings. We do not find any significant differences between the performance of a high-quality general-purpose voice and a limited-domain voice, both trained with speech recorded by actors. As expected, the high-quality general-purpose voice is rated higher than the limited-domain voice for out-of-domain sentences and lower for in-domain sentences. There is also a trend for long or negative-content utterances to receive lower ratings.

A Mixed-Initiative Conversational Dialogue System for Healthcare
Fabrizio Morbini | Eric Forbell | David DeVault | Kenji Sagae | David Traum | Albert Rizzo
Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue

2011

An Evaluation of Alternative Strategies for Implementing Dialogue Policies Using Statistical Classification and Hand-Authored Rules
David DeVault | Anton Leuski | Kenji Sagae
Proceedings of 5th International Joint Conference on Natural Language Processing

Joint Identification and Segmentation of Domain-Specific Dialogue Acts for Conversational Dialogue Systems
Fabrizio Morbini | Kenji Sagae
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

Toward Learning and Evaluation of Dialogue Policies with Text Examples
David DeVault | Anton Leuski | Kenji Sagae
Proceedings of the SIGDIAL 2011 Conference

2010

Latent Mixture of Discriminative Experts for Multimodal Prediction Modeling
Derya Ozkan | Kenji Sagae | Louis-Philippe Morency
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

Practical Evaluation of Speech Recognizers for Virtual Human Dialogue Systems
Xuchen Yao | Pravin Bhutada | Kallirroi Georgila | Kenji Sagae | Ron Artstein | David Traum
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We perform a large-scale evaluation of multiple off-the-shelf speech recognizers across diverse domains for virtual human dialogue systems. Our evaluation is aimed at speech recognition consumers and potential consumers with limited experience with readily available recognizers. We focus on practical factors to determine what levels of performance can be expected from different available recognizers in various projects featuring different types of conversational utterances. Our results show that there is no single recognizer that outperforms all other recognizers in all domains. The performance of each recognizer may vary significantly depending on the domain, the size and perplexity of the corpus, the out-of-vocabulary rate, and whether acoustic and language model adaptation has been used or not. We expect that our evaluation will prove useful to other speech recognition consumers, especially in the dialogue community, and will shed some light on the key problem in spoken dialogue systems of selecting the most suitable available speech recognition system for a particular application, and what impact training will have.

Interpretation of Partial Utterances in Virtual Human Dialogue Systems
Kenji Sagae | David DeVault | David Traum
Proceedings of the NAACL HLT 2010 Demonstration Session

Dynamic Programming for Linear-Time Incremental Parsing
Liang Huang | Kenji Sagae
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

Open-domain Commonsense Reasoning Using Discourse Relations from a Corpus of Weblog Stories
Matthew Gerber | Andrew Gordon | Kenji Sagae
Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading

Self-Training without Reranking for Parser Domain Adaptation and Its Impact on Semantic Role Labeling
Kenji Sagae
Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing

2009

Towards Natural Language Understanding of Partial Speech Recognition Results in Dialogue Systems
Kenji Sagae | Gwen Christian | David DeVault | David Traum
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

Analysis of Discourse Structure with Syntactic Dependencies and Data-Driven Shift-Reduce Parsing
Kenji Sagae
Proceedings of the 11th International Conference on Parsing Technologies (IWPT’09)

Clustering Words by Syntactic Similarity improves Dependency Parsing of Predicate-argument Structures
Kenji Sagae | Andrew S. Gordon
Proceedings of the 11th International Conference on Parsing Technologies (IWPT’09)

Can I Finish? Learning When to Respond to Incremental Interpretation Results in Interactive Dialogue
David DeVault | Kenji Sagae | David Traum
Proceedings of the SIGDIAL 2009 Conference

2008

Shift-Reduce Dependency DAG Parsing
Kenji Sagae | Jun’ichi Tsujii
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

GENIA-GR: a Grammatical Relation Corpus for Parser Evaluation in the Biomedical Domain
Yuka Tateisi | Yusuke Miyao | Kenji Sagae | Jun’ichi Tsujii
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We report the construction of a corpus for parser evaluation in the biomedical domain. A 50-abstract subset (492 sentences) of the GENIA corpus (Kim et al., 2003) is annotated with labeled head-dependent relations using the grammatical relations (GR) evaluation scheme (Carroll et al., 1998) ,which has been used for parser evaluation in the newswire domain.

Task-oriented Evaluation of Syntactic Parsers and Their Representations
Yusuke Miyao | Rune Sætre | Kenji Sagae | Takuya Matsuzaki | Jun’ichi Tsujii
Proceedings of ACL-08: HLT

Evaluating the Effects of Treebank Size in a Practical Application for Parsing
Kenji Sagae | Yusuke Miyao | Rune Saetre | Jun’ichi Tsujii
Software Engineering, Testing, and Quality Assurance for Natural Language Processing

2007

Dependency Parsing and Domain Adaptation with LR Models and Parser Ensembles
Kenji Sagae | Jun’ichi Tsujii
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

HPSG Parsing with Shallow Dependency Constraints
Kenji Sagae | Yusuke Miyao | Jun’ichi Tsujii
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

High-accuracy Annotation and Parsing of CHILDES Transcripts
Kenji Sagae | Eric Davis | Alon Lavie | Brian MacWhinney | Shuly Wintner
Proceedings of the Workshop on Cognitive Aspects of Computational Language Acquisition

2006

Parser Combination by Reparsing
Kenji Sagae | Alon Lavie
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

A Fast, Accurate Deterministic Parser for Chinese
Mengqiu Wang | Kenji Sagae | Teruko Mitamura
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

A Best-First Probabilistic Shift-Reduce Parser
Kenji Sagae | Alon Lavie
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

2005

Automatic Measurement of Syntactic Development in Child Language
Kenji Sagae | Alon Lavie | Brian MacWhinney
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

A Classifier-Based Parser with Linear Run-Time Complexity
Kenji Sagae | Alon Lavie
Proceedings of the Ninth International Workshop on Parsing Technology

2004

The significance of recall in automatic metrics for MT evaluation
Alon Lavie | Kenji Sagae | Shyamsundar Jayaraman
Proceedings of the 6th Conference of the Association for Machine Translation in the Americas: Technical Papers

Recent research has shown that a balanced harmonic mean (F1 measure) of unigram precision and recall outperforms the widely used BLEU and NIST metrics for Machine Translation evaluation in terms of correlation with human judgments of translation quality. We show that significantly better correlations can be achieved by placing more weight on recall than on precision. While this may seem unexpected, since BLEU and NIST focus on n-gram precision and disregard recall, our experiments show that correlation with human judgments is highest when almost all of the weight is assigned to recall. We also show that stemming is significantly beneficial not just to simpler unigram precision and recall based metrics, but also to BLEU and NIST.

Adding Syntactic Annotations to Transcripts of Parent-Child Dialogs
Kenji Sagae | Brian MacWhinney | Alon Lavie
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2003

Combining Rule-based and Data-driven Techniques for Grammatical Relation Extraction in Spoken Language
Kenji Sagae | Alon Lavie
Proceedings of the Eighth International Conference on Parsing Technologies

We investigate an aspect of the relationship between parsing and corpus-based methods in NLP that has received relatively little attention: coverage augmentation in rule-based parsers. In the specific task of determining grammatical relations (such as subjects and objects) in transcribed spoken language, we show that a combination of rule-based and corpus-based approaches, where a rule-based system is used as the teacher (or an automatic data annotator) to a corpus-based system, outperforms either system in isolation.

2001

Parsing the CHILDES Database: Methodology and Lessons Learned
Kenji Sagae | Alon Lavie | Brian MacWhinney
Proceedings of the Seventh International Workshop on Parsing Technologies

Co-authors

Fabrizio Morbini 5

Brian MacWhinney 4

Stephan Oepen 4

Kallirroi Georgila 2

Andrew Gordon 2

Louis-Philippe Morency 2

Albert A. Rizzo 2

Djamé Seddah 2

Reut Tsarfaty 2

Ashish Vaswani 2

Kartik Audhkhasi 1

Pravin Bhutada 1

Alan W. Black 1

Dominique Brunato 1

Agustina Carando 1

John A. Carroll 1

Moitreya Chatterjee 1

Gwen Christian 1

Kelly Christoffersen 1

Stephen Clark 1

Ann Copestake 1

Morteza Dehghani 1

Felice Dell’Orletta 1

Paloma Fernandez Mira 1

Dan Flickinger 1

Justin Garten 1

Panayiotis Georgiou 1

Matthew Gerber 1

Christan Grant 1

Julia Hockenmaier 1

Zachary Nicholas Houghton 1

Shyamsundar Jayaraman 1

Aravind Joshi 1

Nicolai Kalisch 1

Ronald M. Kaplan 1

Tracy Holloway King 1

Sandra Kübler 1

Shannon Lubetich 1

Jan Tore Lønning 1

Christopher D. Manning 1

Yuji Matsumoto 1

Takuya Matsuzaki 1

Alessio Miaschi 1

Teruko Mitamura 1

Shrikanth Narayanan 1

Sunghyun Park 1

Emily Prud’hommeaux 1

Claudia H. Sanchez Gutierrez 1

Claudia Helena Sanchez-Gutierrez 1

Anders Søgaard 1

Giulia Venturi 1

Shuly Wintner 1

Josef van Genabith 1

Venues