2024
pdf
bib
abs
Difficult for Whom? A Study of Japanese Lexical Complexity
Adam Nohejl
|
Akio Hayakawa
|
Yusuke Ide
|
Taro Watanabe
Proceedings of the Third Workshop on Text Simplification, Accessibility and Readability (TSAR 2024)
The tasks of lexical complexity prediction (LCP) and complex word identification (CWI) commonly presuppose that difficult-to-understand words are shared by the target population. Meanwhile, personalization methods have also been proposed to adapt models to individual needs. We verify that a recent Japanese LCP dataset is representative of its target population by partially replicating the annotation. By another reannotation we show that native Chinese speakers perceive the complexity differently due to Sino-Japanese vocabulary. To explore the possibilities of personalization, we compare competitive baselines trained on the group mean ratings and individual ratings in terms of performance for an individual. We show that the model trained on a group mean performs similarly to an individual model in the CWI task, while achieving good LCP performance for an individual is difficult. We also experiment with adapting a finetuned BERT model, which results only in marginal improvements across all settings.
pdf
bib
abs
An Extensible Massively Multilingual Lexical Simplification Pipeline Dataset using the MultiLS Framework
Matthew Shardlow
|
Fernando Alva-Manchego
|
Riza Batista-Navarro
|
Stefan Bott
|
Saul Calderon Ramirez
|
Rémi Cardon
|
Thomas François
|
Akio Hayakawa
|
Andrea Horbach
|
Anna Hülsing
|
Yusuke Ide
|
Joseph Marvin Imperial
|
Adam Nohejl
|
Kai North
|
Laura Occhipinti
|
Nelson Peréz Rojas
|
Nishat Raihan
|
Tharindu Ranasinghe
|
Martin Solis Salazar
|
Marcos Zampieri
|
Horacio Saggion
Proceedings of the 3rd Workshop on Tools and Resources for People with REAding DIfficulties (READI) @ LREC-COLING 2024
We present preliminary findings on the MultiLS dataset, developed in support of the 2024 Multilingual Lexical Simplification Pipeline (MLSP) Shared Task. This dataset currently comprises of 300 instances of lexical complexity prediction and lexical simplification across 10 languages. In this paper, we (1) describe the annotation protocol in support of the contribution of future datasets and (2) present summary statistics on the existing data that we have gathered. Multilingual lexical simplification can be used to support low-ability readers to engage with otherwise difficult texts in their native, often low-resourced, languages.
pdf
bib
abs
Toward the Evaluation of Large Language Models Considering Score Variance across Instruction Templates
Yusuke Sakai
|
Adam Nohejl
|
Jiangnan Hang
|
Hidetaka Kamigaito
|
Taro Watanabe
Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP
The natural language understanding (NLU) performance of large language models (LLMs) has been evaluated across various tasks and datasets. The existing evaluation methods, however, do not take into account the variance in scores due to differences in prompts, which leads to unfair evaluation and comparison of NLU performance. Moreover, evaluation designed for specific prompts is inappropriate for instruction tuning, which aims to perform well with any prompt. It is therefore necessary to find a way to measure NLU performance in a fair manner, considering score variance between different instruction templates. In this study, we provide English and Japanese cross-lingual datasets for evaluating the NLU performance of LLMs, which include multiple instruction templates for fair evaluation of each task, along with regular expressions to constrain the output format. Furthermore, we propose the Sharpe score as an evaluation metric that takes into account the variance in scores between templates. Comprehensive analysis of English and Japanese LLMs reveals that the high variance among templates has a significant impact on the fair evaluation of LLMs.
pdf
bib
abs
The BEA 2024 Shared Task on the Multilingual Lexical Simplification Pipeline
Matthew Shardlow
|
Fernando Alva-Manchego
|
Riza Batista-Navarro
|
Stefan Bott
|
Saul Calderon Ramirez
|
Rémi Cardon
|
Thomas François
|
Akio Hayakawa
|
Andrea Horbach
|
Anna Hülsing
|
Yusuke Ide
|
Joseph Marvin Imperial
|
Adam Nohejl
|
Kai North
|
Laura Occhipinti
|
Nelson Peréz Rojas
|
Nishat Raihan
|
Tharindu Ranasinghe
|
Martin Solis Salazar
|
Sanja Štajner
|
Marcos Zampieri
|
Horacio Saggion
Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)
We report the findings of the 2024 Multilingual Lexical Simplification Pipeline shared task. We released a new dataset comprising 5,927 instances of lexical complexity prediction and lexical simplification on common contexts across 10 languages, split into trial (300) and test (5,627). 10 teams participated across 2 tracks and 10 languages with 233 runs evaluated across all systems. Five teams participated in all languages for the lexical complexity prediction task and 4 teams participated in all languages for the lexical simplification task. Teams employed a range of strategies, making use of open and closed source large language models for lexical simplification, as well as feature-based approaches for lexical complexity prediction. The highest scoring team on the combined multilingual data was able to obtain a Pearson’s correlation of 0.6241 and an ACC@1@Top1 of 0.3772, both demonstrating that there is still room for improvement on two difficult sub-tasks of the lexical simplification pipeline.
2023
pdf
bib
abs
Japanese Lexical Complexity for Non-Native Readers: A New Dataset
Yusuke Ide
|
Masato Mita
|
Adam Nohejl
|
Hiroki Ouchi
|
Taro Watanabe
Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023)
Lexical complexity prediction (LCP) is the task of predicting the complexity of words in a text on a continuous scale. It plays a vital role in simplifying or annotating complex words to assist readers. To study lexical complexity in Japanese, we construct the first Japanese LCP dataset. Our dataset provides separate complexity scores for Chinese/Korean annotators and others to address the readers’ L1-specific needs. In the baseline experiment, we demonstrate the effectiveness of a BERT-based system for Japanese LCP.
pdf
bib
abs
NAISTeacher: A Prompt and Rerank Approach to Generating Teacher Utterances in Educational Dialogues
Justin Vasselli
|
Christopher Vasselli
|
Adam Nohejl
|
Taro Watanabe
Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023)
This paper presents our approach to the BEA 2023 shared task of generating teacher responses in educational dialogues, using the Teacher-Student Chatroom Corpus. Our system prompts GPT-3.5-turbo to generate initial suggestions, which are then subjected to reranking. We explore multiple strategies for candidate generation, including prompting for multiple candidates and employing iterative few-shot prompts with negative examples. We aggregate all candidate responses and rerank them based on DialogRPT scores. To handle consecutive turns in the dialogue data, we divide the task of generating teacher utterances into two components: teacher replies to the student and teacher continuations of previously sent messages. Through our proposed methodology, our system achieved the top score on both automated metrics and human evaluation, surpassing the reference human teachers on the latter.