Naho Orita

2023

Theoretical Linguistics Rivals Embeddings in Language Clustering for Multilingual Named Entity Recognition
Sakura Imai | Daisuke Kawahara | Naho Orita | Hiromune Oda
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

While embedding-based methods have been dominant in language clustering for multilingual tasks, clustering based on linguistic features has not yet been explored much, as it remains baselines (Tan et al., 2019; Shaffer, 2021). This study investigates whether and how theoretical linguistics improves language clustering for multilingual named entity recognition (NER). We propose two types of language groupings: one based on morpho-syntactic features in a nominal domain and one based on a head parameter. Our NER experiments show that the proposed methods largely outperform a state-of-the-art embedding-based model, suggesting that theoretical linguistics plays a significant role in multilingual learning tasks.

pdf bib abs

Automated Generation of Multiple-Choice Cloze Questions for Assessing English Vocabulary Using GPT-turbo 3.5
Qiao Wang | Ralph Rose | Naho Orita | Ayaka Sugawara
Proceedings of the Joint 3rd International Conference on Natural Language Processing for Digital Humanities and 8th International Workshop on Computational Linguistics for Uralic Languages

A common way of assessing language learners’ mastery of vocabulary is via multiple-choice cloze (i.e., fill-in-the-blank) questions. But the creation of test items can be laborious for individual teachers or in large-scale language programs. In this paper, we evaluate a new method for automatically generating these types of questions using large language models (LLM). The VocaTT (vocabulary teaching and training) engine is written in Python and comprises three basic steps: pre-processing target word lists, generating sentences and candidate word options using GPT, and finally selecting suitable word options. To test the efficiency of this system, 60 questions were generated targeting academic words. The generated items were reviewed by expert reviewers who judged the well-formedness of the sentences and word options, adding comments to items judged not well-formed. Results showed a 75% rate of well-formedness for sentences and 66.85% rate for suitable word options. This is a marked improvement over the generator used earlier in our research which did not take advantage of GPT’s capabilities. Post-hoc qualitative analysis reveals several points for improvement in future work including cross-referencing part-of-speech tagging, better sentence validation, and improving GPT prompts.

2017

pdf bib abs

Predicting Japanese scrambling in the wild
Naho Orita
Proceedings of the 7th Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2017)

Japanese speakers have a choice between canonical SOV and scrambled OSV word order to express the same meaning. Although previous experiments examine the influence of one or two factors for scrambling in a controlled setting, it is not yet known what kinds of multiple effects contribute to scrambling. This study uses naturally distributed data to test the multiple effects on scrambling simultaneously. A regression analysis replicates the NP length effect and suggests the influence of noun types, but it provides no evidence for syntactic priming, given-new ordering, and the animacy effect. These findings only show evidence for sentence-internal factors, but we find no evidence that discourse level factors play a role.

2016

pdf bib

Toward the automatic extraction of knowledge of usable goods
Mei Uemura | Naho Orita | Naoaki Okazaki | Kentaro Inui
Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation: Oral Papers

pdf bib abs

This study proposes a computational model of the discourse segments in lyrics to understand and to model the structure of lyrics. To test our hypothesis that discourse segmentations in lyrics strongly correlate with repeated patterns, we conduct the first large-scale corpus study on discourse segments in lyrics. Next, we propose the task to automatically identify segment boundaries in lyrics and train a logistic regression model for the task with the repeated pattern and textual features. The results of our empirical experiments illustrate the significance of capturing repeated patterns in predicting the boundaries of discourse segments in lyrics.

pdf bib

Incremental Prediction of Sentence-final Verbs: Humans versus Machines
Alvin Grissom II | Naho Orita | Jordan Boyd-Graber
Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning