2023
pdf
bib
abs
Selective-LAMA: Selective Prediction for Confidence-Aware Evaluation of Language Models
Hiyori Yoshikawa
|
Naoaki Okazaki
Findings of the Association for Computational Linguistics: EACL 2023
Recent studies have suggested that neural language models learn and store a large amount of facts and commonsense knowledge from training data. The ability of language models to restore such knowledge is often evaluated via zero-shot cloze-style QA tasks. However, such evaluations rely only on prediction accuracy without punishing the systems for their mistakes, e.g., simply guessing or hallucinating likely answers. Selective prediction is a more informative evaluation framework that takes the confidence of predictions into account. Under the selective prediction setting, a model is evaluated not only by the number of correct predictions, but also by the ability to filter out dubious predictions by estimating the confidence of individual predictions. Such confidence-aware evaluation is crucial for determining whether to trust zero-shot predictions of language models. In this paper, we apply the selective prediction setting to an existing benchmark, LAMA probe, and conduct extensive experiments with recent neural language models and different confidence functions. We empirically show that our Selective-LAMA evaluation is more robust to the effect of simple guesses than the conventional accuracy-based evaluation. Our evaluation reveals the importance of the choice of confidence functions by showing that simply relying on token probabilities is not always the best choice. Further analysis shows that various confidence functions exhibit different preferences over predicted tokens for a given context.
2021
pdf
bib
abs
Tell Me What You Read: Automatic Expertise-Based Annotator Assignment for Text Annotation in Expert Domains
Hiyori Yoshikawa
|
Tomoya Iwakura
|
Kimi Kaneko
|
Hiroaki Yoshida
|
Yasutaka Kumano
|
Kazutaka Shimada
|
Rafal Rzepka
|
Patrycja Swieczkowska
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
This paper investigates the effectiveness of automatic annotator assignment for text annotation in expert domains. In the task of creating high-quality annotated corpora, expert domains often cover multiple sub-domains (e.g. organic and inorganic chemistry in the chemistry domain) either explicitly or implicitly. Therefore, it is crucial to assign annotators to documents relevant with their fine-grained domain expertise. However, most of existing methods for crowdsoucing estimate reliability of each annotator or annotated instance only after the annotation process. To address the issue, we propose a method to estimate the domain expertise of each annotator before the annotation process using information easily available from the annotators beforehand. We propose two measures to estimate the annotator expertise: an explicit measure using the predefined categories of sub-domains, and an implicit measure using distributed representations of the documents. The experimental results on chemical name annotation tasks show that the annotation accuracy improves when both explicit and implicit measures for annotator assignment are combined.
pdf
bib
abs
Evaluating Hierarchical Document Categorisation
Qian Sun
|
Aili Shen
|
Hiyori Yoshikawa
|
Chunpeng Ma
|
Daniel Beck
|
Tomoya Iwakura
|
Timothy Baldwin
Proceedings of the 19th Annual Workshop of the Australasian Language Technology Association
Hierarchical document categorisation is a special case of multi-label document categorisation, where there is a taxonomic hierarchy among the labels. While various approaches have been proposed for hierarchical document categorisation, there is no standard benchmark dataset, resulting in different methods being evaluated independently and there being no empirical consensus on what methods perform best. In this work, we examine different combinations of neural text encoders and hierarchical methods in an end-to-end framework, and evaluate over three datasets. We find that the performance of hierarchical document categorisation is determined not only by how the hierarchical information is modelled, but also the structure of the label hierarchy and class distribution.
pdf
bib
abs
On the (In)Effectiveness of Images for Text Classification
Chunpeng Ma
|
Aili Shen
|
Hiyori Yoshikawa
|
Tomoya Iwakura
|
Daniel Beck
|
Timothy Baldwin
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume
Images are core components of multi-modal learning in natural language processing (NLP), and results have varied substantially as to whether images improve NLP tasks or not. One confounding effect has been that previous NLP research has generally focused on sophisticated tasks (in varying settings), generally applied to English only. We focus on text classification, in the context of assigning named entity classes to a given Wikipedia page, where images generally complement the text and the Wikipedia page can be in one of a number of different languages. Our experiments across a range of languages show that images complement NLP models (including BERT) trained without external pre-training, but when combined with BERT models pre-trained on large-scale external data, images contribute nothing.
2019
pdf
bib
abs
Detecting Chemical Reactions in Patents
Hiyori Yoshikawa
|
Dat Quoc Nguyen
|
Zenan Zhai
|
Christian Druckenbrodt
|
Camilo Thorne
|
Saber A. Akhondi
|
Timothy Baldwin
|
Karin Verspoor
Proceedings of the 17th Annual Workshop of the Australasian Language Technology Association
Extracting chemical reactions from patents is a crucial task for chemists working on chemical exploration. In this paper we introduce the novel task of detecting the textual spans that describe or refer to chemical reactions within patents. We formulate this task as a paragraph-level sequence tagging problem, where the system is required to return a sequence of paragraphs which contain a description of a reaction. To address this new task, we construct an annotated dataset from an existing proprietary database of chemical reactions manually extracted from patents. We introduce several baseline methods for the task and evaluate them over our dataset. Through error analysis, we discuss what makes the task complex and challenging, and suggest possible directions for future research.
2018
pdf
bib
abs
Model Transfer with Explicit Knowledge of the Relation between Class Definitions
Hiyori Yoshikawa
|
Tomoya Iwakura
Proceedings of the 22nd Conference on Computational Natural Language Learning
This paper investigates learning methods for multi-class classification using labeled data for the target classification scheme and another labeled data for a similar but different classification scheme (support scheme). We show that if we have prior knowledge about the relation between support and target classification schemes in the form of a class correspondence table, we can use it to improve the model performance further than the simple multi-task learning approach. Instead of learning the individual classification layers for the support and target schemes, the proposed method converts the class label of each example on the support scheme into a set of candidate class labels on the target scheme via the class correspondence table, and then uses the candidate labels to learn the classification layer for the target scheme. We evaluate the proposed method on two tasks in NLP. The experimental results show that our method effectively learns the target schemes especially for the classes that have a tight connection to certain support classes.