Yao Sun


2024

pdf bib
Chinese UMR annotation: Can LLMs help?
Haibo Sun | Nianwen Xue | Jin Zhao | Liulu Yue | Yao Sun | Keer Xu | Jiawei Wu
Proceedings of the Fifth International Workshop on Designing Meaning Representations @ LREC-COLING 2024

We explore using LLMs, GPT-4 specifically, to generate draft sentence-level Chinese Uniform Meaning Representations (UMRs) that human annotators can revise to speed up the UMR annotation process. In this study, we use few-shot learning and Think-Aloud prompting to guide GPT-4 to generate sentence-level graphs of UMR. Our experimental results show that compared with annotating UMRs from scratch, using LLMs as a preprocessing step reduces the annotation time by two thirds on average. This indicates that there is great potential for integrating LLMs into the pipeline for complicated semantic annotation tasks.

pdf bib
What Are the Implications of Your Question? Non-Information Seeking Question-Type Identification in CNN Transcripts
Yao Sun | Anastasiia Tatlubaeva | Zhihan Li | Chester Palen-Michel
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Non-information seeking questions (NISQ) capture the subtle dynamics of human discourse. In this work, we utilize a dataset of over 1,500 information-seeking question(ISQ) and NISQ to evaluate human and machine performance on classifying fine-grained NISQ types. We introduce the first publicly available corpus focused on annotating both ISQs and NISQs as an initial benchmark. Additionally, we establish competitive baselines by assessing diverse systems, including Generative Pre-Trained Transformer Language models, on a new question classification task. Our results demonstrate the inherent complexity of making nuanced NISQ distinctions. The dataset is publicly available at https://github.com/YaoSun0422/NISQ_dataset.git