Sicong Huang
2024
Toward Faithful Dialogs: Evaluating and Improving the Faithfulness of Dialog Systems
Sicong Huang
Proceedings of the 20th Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems
My primary research interests lie in evaluating and improving the faithfulness of language model-based text generation systems. Recent advances in large language models (LLMs) such as GPT-4 and Llama have enabled the wide adoption of LLMs in various aspects of natural language processing (NLP). Despite their widespread use, LLMs still suffer from the problem of hallucination, limiting the practicality of deploying such systems in use cases where being factual and faithful is of critical importance. My research specifically aims to evaluate and improve the faithfulness, i.e. the factual alignment between the generated text and a given context, of text generation systems. By developing techniques to reliably evaluate, label, and improve generation faithfulness, we can enable wider adoption of dialog systems that need to converse with human users using accurate information.
2019
WTMED at MEDIQA 2019: A Hybrid Approach to Biomedical Natural Language Inference
Zhaofeng Wu
|
Yan Song
|
Sicong Huang
|
Yuanhe Tian
|
Fei Xia
Proceedings of the 18th BioNLP Workshop and Shared Task
Natural language inference (NLI) is challenging, especially when it is applied to technical domains such as biomedical settings. In this paper, we propose a hybrid approach to biomedical NLI where different types of information are exploited for this task. Our base model includes a pre-trained text encoder as the core component, and a syntax encoder and a feature encoder to capture syntactic and domain-specific information. Then we combine the output of different base models to form more powerful ensemble models. Finally, we design two conflict resolution strategies when the test data contain multiple (premise, hypothesis) pairs with the same premise. We train our models on the MedNLI dataset, yielding the best performance on the test set of the MEDIQA 2019 Task 1.