Daniel Lee


2024

pdf bib
Towards Cross-Cultural Machine Translation with Retrieval-Augmented Generation from Multilingual Knowledge Graphs
Simone Conia | Daniel Lee | Min Li | Umar Farooq Minhas | Saloni Potdar | Yunyao Li
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Translating text that contains entity names is a challenging task, as cultural-related references can vary significantly across languages. These variations may also be caused by transcreation, an adaptation process that entails more than transliteration and word-for-word translation. In this paper, we address the problem of cross-cultural translation on two fronts: (i) we introduce XC-Translate, the first large-scale, manually-created benchmark for machine translation that focuses on text that contains potentially culturally-nuanced entity names, and (ii) we propose KG-MT, a novel end-to-end method to integrate information from a multilingual knowledge graph into a neural machine translation model by leveraging a dense retrieval mechanism. Our experiments and analyses show that current machine translation systems and large language models still struggle to translate texts containing entity names, whereas KG-MT outperforms state-of-the-art approaches by a large margin, obtaining a 129% and 62% relative improvement compared to NLLB-200 and GPT-4, respectively.

pdf bib
RETAIN: Interactive Tool for Regression Testing Guided LLM Migration
Tanay Dixit | Daniel Lee | Sally Fang | Sai Sree Harsha | Anirudh Sureshan | Akash V Maharaj | Yunyao Li
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Large Language Models (LLMs) are increasingly integrated into diverse applications. The rapid evolution of LLMs presents opportunities for developers to enhance applications continuously. However, this constant adaptation can also lead to performance regressions during model migrations. While several interactive tools have been proposed to streamline the complexity of prompt engineering, few address the specific requirements of regression testing for LLM Migrations. To bridge this gap, we introduce RETAIN (REgression Testing guided LLM migrAtIoN), a tool designed explicitly for regression testing in LLM Migrations. RETAIN comprises two key components: an interactive interface tailored to regression testing needs during LLM migrations, and an error discovery module that facilitates understanding of differences in model behaviors. The error discovery module generates textual descriptions of various errors or differences between model outputs, providing actionable insights for prompt refinement. Our automatic evaluation and empirical user studies demonstrate that RETAIN, when compared to manual evaluation, enabled participants to identify twice as many errors, facilitated experimentation with 75% more prompts, and achieves 12% higher metric scores in a given time frame.

pdf bib
ConvKGYarn: Spinning Configurable and Scalable Conversational Knowledge Graph QA Datasets with Large Language Models
Ronak Pradeep | Daniel Lee | Ali Mousavi | Jeffrey Pound | Yisi Sang | Jimmy Lin | Ihab Ilyas | Saloni Potdar | Mostafa Arefiyan | Yunyao Li
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track

The rapid evolution of Large Language Models (LLMs) and conversational assistants necessitates dynamic, scalable, and configurable conversational datasets for training and evaluation.These datasets must accommodate diverse user interaction modes, including text and voice, each presenting unique modeling challenges. Knowledge Graphs (KGs), with their structured and evolving nature, offer an ideal foundation for current and precise knowledge.Although human-curated KG-based conversational datasets exist, they struggle to keep pace with the rapidly changing user information needs.We present ConvKGYarn, a scalable method for generating up-to-date and configurable conversational KGQA datasets. Qualitative psychometric analyses demonstrate ConvKGYarn’s effectiveness in producing high-quality data comparable to popular conversational KGQA datasets across various metrics.ConvKGYarn excels in adhering to human interaction configurations and operating at a significantly larger scale.We showcase ConvKGYarn’s utility by testing LLMs on diverse conversations — exploring model behavior on conversational KGQA sets with different configurations grounded in the same KG fact set.Our results highlight the ability of ConvKGYarn to improve KGQA foundations and evaluate parametric knowledge of LLMs, thus offering a robust solution to the constantly evolving landscape of conversational assistants.

2023

pdf bib
Increasing Coverage and Precision of Textual Information in Multilingual Knowledge Graphs
Simone Conia | Min Li | Daniel Lee | Umar Minhas | Ihab Ilyas | Yunyao Li
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Recent work in Natural Language Processing and Computer Vision has been using textual information – e.g., entity names and descriptions – available in knowledge graphs to ground neural models to high-quality structured data. However, when it comes to non-English languages, the quantity and quality of textual information are comparatively scarce. To address this issue, we introduce the novel task of automatic Knowledge Graph Completion (KGE) and perform a thorough investigation on bridging the gap in both the quantity and quality of textual information between English and non-English languages. More specifically, we: i) bring to light the problem of increasing multilingual coverage and precision of entity names and descriptions in Wikidata; ii) demonstrate that state-of-the-art methods, namely, Machine Translation (MT), Web Search (WS), and Large Language Models (LLMs), struggle with this task; iii) present M-NTA, a novel unsupervised approach that combines MT, WS, and LLMs to generate high-quality textual information; and, iv) study the impact of increasing multilingual coverage and precision of non-English textual information in Entity Linking, Knowledge Graph Completion, and Question Answering. As part of our effort towards better multilingual knowledge graphs, we also introduce WikiKGE-10, the first human-curated benchmark to evaluate KGE approaches in 10 languages across 7 language families.

2021

pdf bib
Answering Chinese Elementary School Social Studies Multiple Choice Questions
Chao-Chun Liang | Daniel Lee | Meng-Tse Wu | Hsin-Min Wang | Keh-Yih Su
International Journal of Computational Linguistics & Chinese Language Processing, Volume 26, Number 2, December 2021

2020

pdf bib
Reference and Document Aware Semantic Evaluation Methods for Korean Language Summarization
Dongyub Lee | Myeong Cheol Shin | Taesun Whang | Seungwoo Cho | Byeongil Ko | Daniel Lee | EungGyun Kim | Jaechoon Jo
Proceedings of the 28th International Conference on Computational Linguistics

Text summarization refers to the process that generates a shorter form of text from the source document preserving salient information. Many existing works for text summarization are generally evaluated by using recall-oriented understudy for gisting evaluation (ROUGE) scores. However, as ROUGE scores are computed based on n-gram overlap, they do not reflect semantic meaning correspondences between generated and reference summaries. Because Korean is an agglutinative language that combines various morphemes into a word that express several meanings, ROUGE is not suitable for Korean summarization. In this paper, we propose evaluation metrics that reflect semantic meanings of a reference summary and the original document, Reference and Document Aware Semantic Score (RDASS). We then propose a method for improving the correlation of the metrics with human judgment. Evaluation results show that the correlation with human judgment is significantly higher for our evaluation metrics than for ROUGE scores.

2016

pdf bib
University of Houston at CL-SciSumm 2016: SVMs with tree kernels and Sentence Similarity
Luis Moraes | Shahryar Baki | Rakesh Verma | Daniel Lee
Proceedings of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL)