Huichen Yang

2025

CSIRO LT at SemEval-2025 Task 8: Answering Questions over Tabular Data using LLMs
Tomas Turek | Shakila Mahjabin Tonni | Vincent Nguyen | Huichen Yang | Sarvnaz Karimi
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

Question Answering over large tables is challenging due to the difficulty of reasoning required in linking information from different parts of a table, such as heading and metadata to the values in the table and information needs. We investigate using Large Language Models (LLM) for tabular reasoning, where, given a pair of a table and a question from the DataBench benchmark, the models generate answers. We experiment with three techniques that enables symbolic reasoning through code execution: a direct code prompting (DCP) approach, ‘DCP_Py’, which uses Python, multi-step code (MSC) prompting ‘MSC_SQL+FS’ using SQL and ReAct prompting, ‘MSR_Py+FS’, which combines multi-step reasoning (MSR), few-shot (FS) learning and Python tools. We also conduct an analysis exploring the impact of answer types, data size, and multi-column dependencies on LLMs’ answer generation performance, including an assessment of the models’ limitations and the underlying challenges of tabular reasoning in LLMs.

pdf bib abs

Enhanced Table Structure Recognition with Multi-Modal Approach
Huichen Yang | Andrew D. Hellicar | Maciej Rybinski | Sarvnaz Karimi
Proceedings of the Third Workshop for Artificial Intelligence for Scientific Publications

Tables are fundamental for presenting information in research articles, technical documents, manuals, and reports. One key challenge is accessing the information in tables that are embedded in Portable Document Format (PDF) files or scanned images. It requires accurately recognising table structures in diverse table layouts and complex tables. The Table Structure Recognition (TSR) task aims to recognise the internal structure of table images and convert them into a machine-readable format. We propose a flexible multi-modal framework for image-based TSR. Our approach employs two-stream transformer encoders alongside task-specific decoders for table structure extraction and cell bounding box detection. Experiments on benchmark datasets demonstrate that our model achieves highly competitive results compared to strong baselines, gaining 5.4% over single-modality approaches on the FinTabNetd dataset.

2024

pdf bib abs

Finding evidence for claims from content presented in experimental results of scientific articles is difficult. The evidence is often presented in the form of tables and figures, and correctly matching it to scientific claims presents automation challenges. The Context24 shared task is launched to support the development of systems able to verify claims by extracting supporting evidence from articles. We explore different facets of this shared task modelled as a search problem and as an information extraction task. We experiment with a range of methods in each of these categories for the two sub-tasks of evidence identification and grounding context identification in the Context24 shared task.

2023

pdf bib abs

KDDIE at SemEval-2023 Task 2: External Knowledge Injection for Named Entity Recognition
Caleb Martin | Huichen Yang | William Hsu
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper introduces our system for the SemEval 2023 Task 2: Multilingual Complex Named Entity Recognition (MultiCoNER II) competition. Our team focused on the sub-task of Named Entity Recognition (NER) for the language of English in the challenge and reported our results. To achieve our goal, we utilized transfer learning by fine-tuning pre-trained language models (PLMs) on the competition dataset. Our approach involved combining a BERT-based PLM with external knowledge to provide additional context to the model. In this report, we present our findings and results.

2022

pdf bib abs

PIEKM: ML-based Procedural Information Extraction and Knowledge Management System for Materials Science Literature
Huichen Yang | Carlos Aguirre | William Hsu
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: System Demonstrations

The published materials science literature contains abundant description information about synthesis procedures that can help discover new material areas, deepen the study of materials synthesis, and accelerate its automated planning. Nevertheless, this information is expressed in unstructured text, and manually processing and assimilating useful information is expensive and time-consuming for researchers. To address this challenge, we develop a Machine Learning-based procedural information extraction and knowledge management system (PIEKM) that extracts procedural information recipe steps, figures, and tables from materials science articles, and provides information retrieval capability and the statistics visualization functionality. Our system aims to help researchers to gain insights and quickly understand the connections among massive data. Moreover, we demonstrate that the machine learning-based system performs well in low-resource scenarios (i.e., limited annotated data) for domain adaption.

pdf bib abs

KDDIE at SemEval-2022 Task 11: Using DeBERTa for Named Entity Recognition
Caleb Martin | Huichen Yang | William Hsu
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

In this work, we introduce our system to the SemEval 2022 Task 11: Multilingual Complex Named Entity Recognition (MultiCoNER) competition. Our team (KDDIE) attempted the sub-task of Named Entity Recognition (NER) for the language of English in the challenge and reported our results. For this task, we use transfer learning method: fine-tuning the pre-trained language models (PLMs) on the competition dataset. Our two approaches are the BERT-based PLMs and PLMs with additional layer such as Condition Random Field. We report our finding and results in this report.

Co-authors

Carlos Alejandro Aguirre 1

Necva Bölücü 1

Andrew D. Hellicar 1

Roelien C. Timmer 1

Shakila Mahjabin Tonni 1

Tomas Turek 1

Stephen Wan 1

Venues

wasp1

Fix author