2024
pdf
bib
abs
Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments
Yu Gu
|
Yiheng Shu
|
Hao Yu
|
Xiao Liu
|
Yuxiao Dong
|
Jie Tang
|
Jayanth Srinivasa
|
Hugo Latapie
|
Yu Su
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
The applications of large language models (LLMs) have expanded well beyond the confines of text processing, signaling a new era where LLMs are envisioned as generalist agents capable of operating within complex environments. These environments are often highly expansive, making it impossible for the LLM to process them within its short-term memory. Motivated by recent research on extending the capabilities of LLMs with tools, we seek to investigate the intriguing potential of tools to augment LLMs in handling such complexity by introducing a novel class of tools, termed *middleware*, to aid in the proactive exploration within these massive environments. Such specialized tools can serve as a middleware layer shielding the LLM from environmental complexity. In two representative complex environments—knowledge bases (KBs) and databases—we demonstrate the significant potential of augmenting language agents with tools in complex environments. Notably, equipped with the middleware, GPT-4 achieves **2.8**X the performance of the best baseline in tasks requiring access to database content and **2.2**X in KB tasks. Our findings illuminate the path for advancing language agents in real-world applications.
pdf
bib
abs
Open-world Multi-label Text Classification with Extremely Weak Supervision
Xintong Li
|
Jinya Jiang
|
Ria Dharmani
|
Jayanth Srinivasa
|
Gaowen Liu
|
Jingbo Shang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
We study open-world multi-label text classification under extremely weak supervision (XWS), where the user only provides a brief description for classification objectives without any labels or ground-truth label space. Similar single-label XWS settings have been explored recently, however, these methods cannot be easily adapted for multi-label. We observe that (1) most documents have a dominant class covering the majority of content and (2) long-tail labels would appear in some documents as a dominant class. Therefore, we first utilize the user description to prompt a large language model (LLM) for dominant keyphrases of a subset of raw documents, and then construct a (initial) label space via clustering. We further apply a zero-shot multi-label classifier to locate the documents with small top predicted scores, so we can revisit their dominant keyphrases for more long-tail labels. We iterate this process to discover a comprehensive label space and construct a multi-label classifier as a novel method, X-MLClass. X-MLClass exhibits a remarkable increase in ground-truth label space coverage on various datasets, for example, a 40% improvement on the AAPD dataset over topic modeling and keyword extraction methods. Moreover, X-MLClass achieves the best end-to-end multi-label classification accuracy.
pdf
bib
abs
Argument-Aware Approach To Event Linking
I-Hung Hsu
|
Zihan Xue
|
Nilay Pochhi
|
Sahil Bansal
|
Prem Natarajan
|
Jayanth Srinivasa
|
Nanyun Peng
Findings of the Association for Computational Linguistics: ACL 2024
Event linking connects event mentions in text with relevant nodes in a knowledge base (KB). Prior research in event linking has mainly borrowed methods from entity linking, overlooking the distinct features of events. Compared to the extensively explored entity linking task, events have more complex structures and can be more effectively distinguished by examining their associated arguments. Moreover, the information-rich nature of events leads to the scarcity of event KBs. This emphasizes the need for event linking models to identify and classify event mentions not in the KB as “out-of-KB,” an area that has received limited attention. In this work, we tackle these challenges by introducing an argument-aware approach. First, we improve event linking models by augmenting input text with tagged event argument information, facilitating the recognition of key information about event mentions. Subsequently, to help the model handle “out-of-KB” scenarios, we synthesize out-of-KB training examples from in-KB instances through controlled manipulation of event arguments. Our experiment across two test datasets showed significant enhancements in both in-KB and out-of-KB scenarios, with a notable 22% improvement in out-of-KB evaluations.
pdf
bib
abs
Answer is All You Need: Instruction-following Text Embedding via Answering the Question
Letian Peng
|
Yuwei Zhang
|
Zilong Wang
|
Jayanth Srinivasa
|
Gaowen Liu
|
Zihan Wang
|
Jingbo Shang
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
This work aims to build a text embedder that can capture characteristics of texts specified by user instructions clarifying the similarity criterion. While previous methods improve general task awareness by injecting the instruction information into encoding, they fail to be sensitive to clearer criteria like “evaluate similarity based on emotion”. We instead propose a different viewpoint, which treats the instruction as a “question” about the input text and encodes the expected answers to obtain the representation accordingly. Intuitively, texts with the same (implicit) semantics would share similar answers following the instruction, thus leading to more similar representations. Specifically, we propose InBedder that instantiates this learning-to-answer idea by only fine-tuning language models via abstractive question answering tasks. Despite its simplicity, InBedder demonstrates significantly improved instruction-following capabilities according to our proposed instruction awareness tests and instruction robustness tests, when applied to language models with large language models (LLMs) (e.g., llama-2-7b) and smaller encoder-based LMs (e.g., roberta-large). Additionally, our qualitative analysis of clustering outcomes, achieved by applying diverse instructions to the same unlabeled corpus, demonstrates a high degree of interpretability in the clusters formed.
pdf
bib
abs
Enhancing Large Language Models through Transforming Reasoning Problems into Classification Tasks
Tarun Raheja
|
Raunak Sinha
|
Advit Deepak
|
Will Healy
|
Jayanth Srinivasa
|
Myungjin Lee
|
Ramana Kompella
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
In this paper, we introduce a novel approach for enhancing the reasoning capabilities of large language models (LLMs) for constraint satisfaction problems (CSPs), by converting reasoning problems into classification tasks. Our method leverages the LLM’s ability to decide when to call a function from a set of logical-linguistic primitives, each of which can interact with a local “scratchpad” memory and logical inference engine. Invocation of these primitives in the correct order writes the constraints to the scratchpad memory and enables the logical engine to verifiably solve the problem. We additionally propose a formal framework for exploring the “linguistic” hardness of CSP reasoning-problems for LLMs. Our experimental results demonstrate that under our proposed method, tasks with significant computational hardness can be converted to a form that is easier for LLMs to solve and yields a 40% improvement over baselines. This opens up new avenues for future research into hybrid cognitive models that integrate symbolic and neural approaches.
2023
pdf
bib
abs
Text-to-SQL Error Correction with Language Models of Code
Ziru Chen
|
Shijie Chen
|
Michael White
|
Raymond Mooney
|
Ali Payani
|
Jayanth Srinivasa
|
Yu Su
|
Huan Sun
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Despite recent progress in text-to-SQL parsing, current semantic parsers are still not accurate enough for practical use. In this paper, we investigate how to build automatic text-to-SQL error correction models. Noticing that token-level edits are out of context and sometimes ambiguous, we propose building clause-level edit models instead. Besides, while most language models of code are not specifically pre-trained for SQL, they know common data structures and their operations in programming languages such as Python. Thus, we propose a novel representation for SQL queries and their edits that adheres more closely to the pre-training corpora of language models of code. Our error correction model improves the exact set match accuracy of different parsers by 2.4-6.5 and obtains up to 4.3 point absolute improvement over two strong baselines.