Yevgeni Berzak


2024

pdf bib
Fine-Grained Prediction of Reading Comprehension from Eye Movements
Omer Shubi | Yoav Meiri | Cfir Avraham Hadar | Yevgeni Berzak
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Can human reading comprehension be assessed from eye movements in reading? In this work, we address this longstanding question using large-scale eyetracking data. We focus on a cardinal and largely unaddressed variant of this question: predicting reading comprehension of a single participant for a single question from their eye movements over a single paragraph. We tackle this task using a battery of recent models from the literature, and three new multimodal language models. We evaluate the models in two different reading regimes: ordinary reading and information seeking, and examine their generalization to new textual items, new participants, and the combination of both. The evaluations suggest that the task is highly challenging, and highlight the importance of benchmarking against a strong text-only baseline. While in some cases eye movements provide improvements over such a baseline, they tend to be small. This could be due to limitations of current modelling approaches, limitations of the data, or because eye movement behavior does not sufficiently pertain to fine-grained aspects of reading comprehension processes. Our study provides an infrastructure for making further progress on this question.

pdf bib
The Effect of Surprisal on Reading Times in Information Seeking and Repeated Reading
Keren Gruteke Klein | Yoav Meiri | Omer Shubi | Yevgeni Berzak
Proceedings of the 28th Conference on Computational Natural Language Learning

The effect of surprisal on processing difficulty has been a central topic of investigation in psycholinguistics. Here, we use eyetracking data to examine three language processing regimes that are common in daily life but have not been addressed with respect to this question: information seeking, repeated processing, and the combination of the two. Using standard regime-agnostic surprisal estimates we find that the prediction of surprisal theory regarding the presence of a linear effect of surprisal on processing times, extends to these regimes. However, when using surprisal estimates from regime-specific contexts that match the contexts and tasks given to humans, we find that in information seeking, such estimates do not improve the predictive power of processing times compared to standard surprisals. Further, regime-specific contexts yield near zero surprisal estimates with no predictive power for processing times in repeated reading. These findings point to misalignments of task and memory representations between humans and current language models, and question the extent to which such models can be used for estimating cognitively relevant quantities. We further discuss theoretical challenges posed by these results.

2022

pdf bib
The Aligned Multimodal Movie Treebank: An audio, video, dependency-parse treebank
Adam Yaari | Jan DeWitt | Henry Hu | Bennett Stankovits | Sue Felshin | Yevgeni Berzak | Helena Aparicio | Boris Katz | Ignacio Cases | Andrei Barbu
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Treebanks have traditionally included only text and were derived from written sources such as newspapers or the web. We introduce the Aligned Multimodal Movie Treebank (AMMT), an English language treebank derived from dialog in Hollywood movies which includes transcriptions of the audio-visual streams with word-level alignment, as well as part of speech tags and dependency parses in the Universal Dependencies formalism. AMMT consists of 31,264 sentences and 218,090 words, that will amount to the 3rd largest UD English treebank and the only multimodal treebank in UD. To help with the web-based annotation effort, we also introduce the Efficient Audio Alignment Annotator (EAAA), a companion tool that enables annotators to significantly speed-up their annotation processes.

2021

pdf bib
Predicting Text Readability from Scrolling Interactions
Sian Gooding | Yevgeni Berzak | Tony Mak | Matt Sharifi
Proceedings of the 25th Conference on Computational Natural Language Learning

Judging the readability of text has many important applications, for instance when performing text simplification or when sourcing reading material for language learners. In this paper, we present a 518 participant study which investigates how scrolling behaviour relates to the readability of English texts. We make our dataset publicly available and show that (1) there are statistically significant differences in the way readers interact with text depending on the text level, (2) such measures can be used to predict the readability of text, and (3) the background of a reader impacts their reading interactions and the factors contributing to text difficulty.

2020

pdf bib
Proceedings of the Second Workshop on Computational Research in Linguistic Typology
Ekaterina Vylomova | Edoardo M. Ponti | Eitan Grossman | Arya D. McCarthy | Yevgeni Berzak | Haim Dubossarsky | Ivan Vulić | Roi Reichart | Anna Korhonen | Ryan Cotterell
Proceedings of the Second Workshop on Computational Research in Linguistic Typology

pdf bib
STARC: Structured Annotations for Reading Comprehension
Yevgeni Berzak | Jonathan Malmaud | Roger Levy
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We present STARC (Structured Annotations for Reading Comprehension), a new annotation framework for assessing reading comprehension with multiple choice questions. Our framework introduces a principled structure for the answer choices and ties them to textual span annotations. The framework is implemented in OneStopQA, a new high-quality dataset for evaluation and analysis of reading comprehension in English. We use this dataset to demonstrate that STARC can be leveraged for a key new application for the development of SAT-like reading comprehension materials: automatic annotation quality probing via span ablation experiments. We further show that it enables in-depth analyses and comparisons between machine and human reading comprehension behavior, including error distributions and guessing ability. Our experiments also reveal that the standard multiple choice dataset in NLP, RACE, is limited in its ability to measure reading comprehension. 47% of its questions can be guessed by machines without accessing the passage, and 18% are unanimously judged by humans as not having a unique correct answer. OneStopQA provides an alternative test set for reading comprehension which alleviates these shortcomings and has a substantially higher human ceiling performance.

pdf bib
Classifying Syntactic Errors in Learner Language
Leshem Choshen | Dmitry Nikolaev | Yevgeni Berzak | Omri Abend
Proceedings of the 24th Conference on Computational Natural Language Learning

We present a method for classifying syntactic errors in learner language, namely errors whose correction alters the morphosyntactic structure of a sentence. The methodology builds on the established Universal Dependencies syntactic representation scheme, and provides complementary information to other error-classification systems. Unlike existing error classification methods, our method is applicable across languages, which we showcase by producing a detailed picture of syntactic errors in learner English and learner Russian. We further demonstrate the utility of the methodology for analyzing the outputs of leading Grammatical Error Correction (GEC) systems.

pdf bib
Bridging Information-Seeking Human Gaze and Machine Reading Comprehension
Jonathan Malmaud | Roger Levy | Yevgeni Berzak
Proceedings of the 24th Conference on Computational Natural Language Learning

In this work, we analyze how human gaze during reading comprehension is conditioned on the given reading comprehension question, and whether this signal can be beneficial for machine reading comprehension. To this end, we collect a new eye-tracking dataset with a large number of participants engaging in a multiple choice reading comprehension task. Our analysis of this data reveals increased fixation times over parts of the text that are most relevant for answering the question. Motivated by this finding, we propose making automated reading comprehension more human-like by mimicking human information-seeking reading behavior during reading comprehension. We demonstrate that this approach leads to performance gains on multiple choice question answering in English for a state-of-the-art reading comprehension model.

2019

pdf bib
Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing
Edoardo Maria Ponti | Helen O’Horan | Yevgeni Berzak | Ivan Vulić | Roi Reichart | Thierry Poibeau | Ekaterina Shutova | Anna Korhonen
Computational Linguistics, Volume 45, Issue 3 - September 2019

Linguistic typology aims to capture structural and semantic variation across the world’s languages. A large-scale typology could provide excellent guidance for multilingual Natural Language Processing (NLP), particularly for languages that suffer from the lack of human labeled resources. We present an extensive literature survey on the use of typological information in the development of NLP techniques. Our survey demonstrates that to date, the use of information in existing typological databases has resulted in consistent but modest improvements in system performance. We show that this is due to both intrinsic limitations of databases (in terms of coverage and feature granularity) and under-utilization of the typological features included in them. We advocate for a new approach that adapts the broad and discrete nature of typological categories to the contextual and continuous nature of machine learning algorithms used in contemporary NLP. In particular, we suggest that such an approach could be facilitated by recent developments in data-driven induction of typological knowledge.

pdf bib
Proceedings of TyP-NLP: The First Workshop on Typology for Polyglot NLP
Haim Dubossarsky | Arya D. McCarthy | Edoardo Maria Ponti | Ivan Vulić | Ekaterina Vylomova | Yevgeni Berzak | Ryan Cotterell | Manaal Faruqui | Anna Korhonen | Roi Reichart
Proceedings of TyP-NLP: The First Workshop on Typology for Polyglot NLP

2018

pdf bib
Assessing Language Proficiency from Eye Movements in Reading
Yevgeni Berzak | Boris Katz | Roger Levy
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

We present a novel approach for determining learners’ second language proficiency which utilizes behavioral traces of eye movements during reading. Our approach provides stand-alone eyetracking based English proficiency scores which reflect the extent to which the learner’s gaze patterns in reading are similar to those of native English speakers. We show that our scores correlate strongly with standardized English proficiency tests. We also demonstrate that gaze information can be used to accurately predict the outcomes of such tests. Our approach yields the strongest performance when the test taker is presented with a suite of sentences for which we have eyetracking data from other readers. However, it remains effective even using eyetracking with sentences for which eye movement data have not been previously collected. By deriving proficiency as an automatic byproduct of eye movements during ordinary reading, our approach offers a potentially valuable new tool for second language proficiency assessment. More broadly, our results open the door to future methods for inferring reader characteristics from the behavioral traces of reading.

pdf bib
Grounding language acquisition by training semantic parsers using captioned videos
Candace Ross | Andrei Barbu | Yevgeni Berzak | Battushig Myanganbayar | Boris Katz
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We develop a semantic parser that is trained in a grounded setting using pairs of videos captioned with sentences. This setting is both data-efficient, requiring little annotation, and similar to the experience of children where they observe their environment and listen to speakers. The semantic parser recovers the meaning of English sentences despite not having access to any annotated sentences. It does so despite the ambiguity inherent in vision where a sentence may refer to any combination of objects, object properties, relations or actions taken by any agent in a video. For this task, we collected a new dataset for grounded language acquisition. Learning a grounded semantic parser — turning sentences into logical forms using captioned videos — can significantly expand the range of data that parsers can be trained on, lower the effort of training a semantic parser, and ultimately lead to a better understanding of child language acquisition.

2017

pdf bib
Predicting Native Language from Gaze
Yevgeni Berzak | Chie Nakamura | Suzanne Flynn | Boris Katz
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

A fundamental question in language learning concerns the role of a speaker’s first language in second language acquisition. We present a novel methodology for studying this question: analysis of eye-movement patterns in second language reading of free-form text. Using this methodology, we demonstrate for the first time that the native language of English learners can be predicted from their gaze fixations when reading English. We provide analysis of classifier uncertainty and learned features, which indicates that differences in English reading are likely to be rooted in linguistic divergences across native languages. The presented framework complements production studies and offers new ground for advancing research on multilingualism.

2016

pdf bib
Anchoring and Agreement in Syntactic Annotations
Yevgeni Berzak | Yan Huang | Andrei Barbu | Anna Korhonen | Boris Katz
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Survey on the Use of Typological Information in Natural Language Processing
Helen O’Horan | Yevgeni Berzak | Ivan Vulić | Roi Reichart | Anna Korhonen
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

In recent years linguistic typologies, which classify the world’s languages according to their functional and structural properties, have been widely used to support multilingual NLP. While the growing importance of typologies in supporting multilingual tasks has been recognised, no systematic survey of existing typological resources and their use in NLP has been published. This paper provides such a survey as well as discussion which we hope will both inform and inspire future work in the area.

pdf bib
Universal Dependencies for Learner English
Yevgeni Berzak | Jessica Kenney | Carolyn Spadine | Jing Xian Wang | Lucia Lam | Keiko Sophie Mori | Sebastian Garza | Boris Katz
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

pdf bib
Contrastive Analysis with Predictive Power: Typology Driven Estimation of Grammatical Error Distributions in ESL
Yevgeni Berzak | Roi Reichart | Boris Katz
Proceedings of the Nineteenth Conference on Computational Natural Language Learning

pdf bib
Do You See What I Mean? Visual Resolution of Linguistic Ambiguities
Yevgeni Berzak | Andrei Barbu | Daniel Harari | Boris Katz | Shimon Ullman
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2014

pdf bib
Reconstructing Native Language Typology from Foreign Language Usage
Yevgeni Berzak | Roi Reichart | Boris Katz
Proceedings of the Eighteenth Conference on Computational Natural Language Learning