Yu-An Lu
Also published as: Yu-an Lu
2024
0x.Yuan at SemEval-2024 Task 2: Agents Debating can reach consensus and produce better outcomes in Medical NLI task
Yu-an Lu
|
Hung-yu Kao
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
In this paper, we introduce a multi-agent debating framework, experimenting on SemEval 2024 Task 2. This innovative system employs a collaborative approach involving expert agents from various medical fields to analyze Clinical Trial Reports (CTRs). Our methodology emphasizes nuanced and comprehensive analysis by leveraging the diverse expertise of agents like Biostatisticians and Medical Linguists. Results indicate that our collaborative model surpasses the performance of individual agents in terms of Macro F1-score. Additionally, our analysis suggests that while initial debates often mirror majority decisions, the debating process refines these outcomes, demonstrating the system’s capability for in-depth analysis beyond simple majority rule. This research highlights the potential of AI collaboration in specialized domains, particularly in medical text interpretation.
0x.Yuan at SemEval-2024 Task 5: Enhancing Legal Argument Reasoning with Structured Prompts
Yu-an Lu
|
Hung-yu Kao
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
The intersection of legal reasoning and Natural Language Processing (NLP) technologies, particularly Large Language Models (LLMs), offers groundbreaking potential for augmenting human capabilities in the legal domain. This paper presents our approach and findings from participating in SemEval-2024 Task 5, focusing on the effect of argument reasoning in civil procedures using legal reasoning prompts. We investigated the impact of structured legal reasoning methodologies, including TREACC, IRAC, IRAAC, and MIRAC, on guiding LLMs to analyze and evaluate legal arguments systematically. Our experimental setup involved crafting specific prompts based on these methodologies to instruct the LLM to dissect and scrutinize legal cases, aiming to discern the cogency of argumentative solutions within a zero-shot learning framework. The performance of our approach, as measured by F1 score and accuracy, demonstrated the efficacy of integrating structured legal reasoning into LLMs for legal analysis. The findings underscore the promise of LLMs, when equipped with legal reasoning prompts, in enhancing their ability to process and reason through complex legal texts, thus contributing to the broader application of AI in legal studies and practice.
2023
Characterised LLMs Affect its Evaluation of Summary and Translation
Yu-An Lu
|
Yu-Ting Lin
Proceedings of the 4th Workshop on Evaluation and Comparison of NLP Systems
In today’s widespread use of Large Language Models (LLMs), there have been significant achievements in various text domains such as generating summaries and translations. However, there is still room for development and improvement in evaluating the outputs of LLMs. In this paper, we propose an innovative scoring system that assesses the quality of summaries and translations using multiple metrics, we also enhance LLM’s performance in scoring tasks by assigning it different roles, effectively making it act as an expert. We test four roles in the study: a teacher, a proofreader, a travel writer, and an internet troll, comparing the advantages and disadvantages of each role in the scoring task. Our research results demonstrate that emphasizing LLM’s multilingual capabilities and strict standards as its identity can effectively boost its performance. Additionally, imbuing LLM with a more critical thinking ability enhances its performance in translation tasks compared to a milder LLM identity. In summary, we show that assigning different identities to LLM can influence its performance in scoring tasks. We believe that this research will contribute to the use of LLMs for scoring purposes.