2024
pdf
bib
abs
Unleashing the Power of LLMs in Court View Generation by Stimulating Internal Knowledge and Incorporating External Knowledge
Yifei Liu
|
Yiquan Wu
|
Ang Li
|
Yating Zhang
|
Changlong Sun
|
Weiming Lu
|
Fei Wu
|
Kun Kuang
Findings of the Association for Computational Linguistics: NAACL 2024
Court View Generation (CVG) plays a vital role in the realm of legal artificial intelligence, which aims to support judges in crafting legal judgment documents. The court view consists of three essential judgment parts: the charge-related, law article-related, and prison term-related parts, each requiring specialized legal knowledge, rendering CVG a challenging task.Although Large Language Models (LLMs) have made remarkable strides in language generation, they encounter difficulties in the knowledge-intensive legal domain.Actually, there can be two types of knowledge: internal knowledge stored within LLMs’ parameters and external knowledge sourced from legal documents outside the models.In this paper, we decompose court views into different parts, stimulate internal knowledge, and incorporate external information to unleash the power of LLMs in the CVG task.To validate our method, we conduct a series of experiment results on two real-world datasets LAIC2021 and CJO2022. The experiments demonstrate that our method is capable of generating more accurate and reliable court views.
pdf
bib
abs
Multi-modal Stance Detection: New Datasets and Model
Bin Liang
|
Ang Li
|
Jingqian Zhao
|
Lin Gui
|
Min Yang
|
Yue Yu
|
Kam-Fai Wong
|
Ruifeng Xu
Findings of the Association for Computational Linguistics: ACL 2024
Stance detection is a challenging task that aims to identify public opinion from social media platforms with respect to specific targets. Previous work on stance detection largely focused on pure texts. In this paper, we study multi-modal stance detection for tweets consisting of texts and images, which are prevalent in today’s fast-growing social media platforms where people often post multi-modal messages. To this end, we create five new multi-modal stance detection datasets of different domains based on Twitter, in which each example consists of a text and an image. In addition, we propose a simple yet effective Targeted Multi-modal Prompt Tuning framework (TMPT), where target information is leveraged to learn multi-modal stance features from textual and visual modalities. Experimental results on our five benchmark datasets show that the proposed TMPT achieves state-of-the-art performance in multi-modal stance detection.
pdf
bib
abs
MobileVLM: A Vision-Language Model for Better Intra- and Inter-UI Understanding
Qinzhuo Wu
|
Weikai Xu
|
Wei Liu
|
Tao Tan
|
Liujian Liujianfeng
|
Ang Li
|
Jian Luan
|
Bin Wang
|
Shuo Shang
Findings of the Association for Computational Linguistics: EMNLP 2024
Recently, mobile AI agents based on VLMs have been gaining increasing attention. These works typically utilize VLM as a foundation, fine-tuning it with instruction-based mobile datasets. However, these VLMs are typically pre-trained on general-domain data, which often results in a lack of fundamental capabilities specific to the mobile domain. Therefore, they may struggle to recognize specific UI elements and understand intra-UI fine-grained information. In addition, the current fine-tuning task focuses on interacting with the most relevant element for the given instruction. These fine-tuned VLMs may still ignore the relationships between UI pages, neglect the roles of elements in page transitions and lack inter-UI understanding. To address issues, we propose a VLM called MobileVLM, which includes two additional pre-training stages to enhance both intra- and inter-UI understanding. We defined four UI-based pre-training tasks, enabling the model to better perceive fine-grained elements and capture page transition actions. To address the lack of mobile pre-training data, we built a large Chinese mobile dataset Mobile3M from scratch, which contains 3 million UI pages, and real-world transition actions, forming a directed graph structure. Experimental results show MobileVLM excels on both our test set and public mobile benchmarks, outperforming existing VLMs.
pdf
bib
abs
Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents
Shihan Deng
|
Weikai Xu
|
Hongda Sun
|
Wei Liu
|
Tao Tan
|
Liujianfeng Liujianfeng
|
Ang Li
|
Jian Luan
|
Bin Wang
|
Rui Yan
|
Shuo Shang
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
With the remarkable advancements of large language models (LLMs), LLM-based agents have become a research hotspot in human-computer interaction.However, there is a scarcity of benchmarks available for LLM-based mobile agents.Benchmarking these agents generally faces three main challenges:(1) The inefficiency of UI-only operations imposes limitations to task evaluation.(2) Specific instructions within a singular application lack adequacy for assessing the multi-dimensional reasoning and decision-making capacities of LLM mobile agents.(3) Current evaluation metrics are insufficient to accurately assess the process of sequential actions. To this end, we propose Mobile-Bench, a novel benchmark for evaluating the capabilities of LLM-based mobile agents.First, we expand conventional UI operations by incorporating 103 collected APIs to accelerate the efficiency of task completion.Subsequently, we collect evaluation data by combining real user queries with augmentation from LLMs.To better evaluate different levels of planning capabilities for mobile agents, our data is categorized into three distinct groups: SAST, SAMT, and MAMT, reflecting varying levels of task complexity. Mobile-Bench comprises 832 data entries, with more than 200 tasks specifically designed to evaluate multi-APP collaboration scenarios.Furthermore, we introduce a more accurate evaluation metric, named CheckPoint, to assess whether LLM-based mobile agents reach essential points during their planning and reasoning steps. Dataset and platform will be released in the future.
pdf
bib
abs
A Challenge Dataset and Effective Models for Conversational Stance Detection
Fuqiang Niu
|
Min Yang
|
Ang Li
|
Baoquan Zhang
|
Xiaojiang Peng
|
Bowen Zhang
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Previous stance detection studies typically concentrate on evaluating stances within individual instances, thereby exhibiting limitations in effectively modeling multi-party discussions concerning the same specific topic, as naturally transpire in authentic social media interactions. This constraint arises primarily due to the scarcity of datasets that authentically replicate real social media contexts, hindering the research progress of conversational stance detection. In this paper, we introduce a new multi-turn conversation stance detection dataset (called
MT-CSD), which encompasses multiple targets for conversational stance detection. To derive stances from this challenging dataset, we propose a global-local attention network (
GLAN) to address both long and short-range dependencies inherent in conversational data. Notably, even state-of-the-art stance detection methods, exemplified by GLAN, exhibit an accuracy of only 50.47%, highlighting the persistent challenges in conversational stance detection. Furthermore, our MT-CSD dataset serves as a valuable resource to catalyze advancements in cross-domain stance detection, where a classifier is adapted from a different yet related target. We believe that MT-CSD will contribute to advancing real-world applications of stance detection research. Our source code, data, and models are available at
https://github.com/nfq729/MT-CSD.
pdf
bib
abs
Enhancing Court View Generation with Knowledge Injection and Guidance
Ang Li
|
Yiquan Wu
|
Yifei Liu
|
Kun Kuang
|
Fei Wu
|
Ming Cai
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Court View Generation (CVG) is a challenging task in the field of Legal Artificial Intelligence (LegalAI), which aims to generate court views based on the plaintiff claims and the fact descriptions. While Pretrained Language Models (PLMs) have showcased their prowess in natural language generation, their application to the complex, knowledge-intensive domain of CVG often reveals inherent limitations. In this paper, we present a novel approach, named Knowledge Injection and Guidance (KIG), designed to bolster CVG using PLMs. To efficiently incorporate domain knowledge during the training stage, we introduce a knowledge-injected prompt encoder for prompt tuning, thereby reducing computational overhead. Moreover, to further enhance the model’s ability to utilize domain knowledge, we employ a generating navigator, which dynamically guides the text generation process in the inference stage without altering the model’s architecture, making it readily transferable. Comprehensive experiments on real-world data demonstrate the effectiveness of our approach compared to several established baselines, especially in the responsivity of claims, where it outperforms the best baseline by 11.87%.
pdf
bib
abs
From Graph to Word Bag: Introducing Domain Knowledge to Confusing Charge Prediction
Ang Li
|
Qiangchao Chen
|
Yiquan Wu
|
Xiang Zhou
|
Kun Kuang
|
Fei Wu
|
Ming Cai
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Confusing charge prediction is a challenging task in legal AI, which involves predicting confusing charges based on fact descriptions. While existing charge prediction methods have shown impressive performance, they face significant challenges when dealing with confusing charges, such as Snatch and Robbery. In the legal domain, constituent elements play a pivotal role in distinguishing confusing charges. Constituent elements are fundamental behaviors underlying criminal punishment and have subtle distinctions among charges. In this paper, we introduce a novel From Graph to Word Bag (FWGB) approach, which introduces domain knowledge regarding constituent elements to guide the model in making judgments on confusing charges, much like a judge’s reasoning process. Specifically, we first construct a legal knowledge graph containing constituent elements to help select keywords for each charge, forming a word bag. Subsequently, to guide the model’s attention towards the differentiating information for each charge within the context, we expand the attention mechanism and introduce a new loss function with attention supervision through words in the word bag. We construct the confusing charges dataset from real-world judicial documents. Experiments demonstrate the effectiveness of our method, especially in maintaining exceptional performance in imbalanced label distributions.
2023
pdf
bib
abs
Stance Detection on Social Media with Background Knowledge
Ang Li
|
Bin Liang
|
Jingqian Zhao
|
Bowen Zhang
|
Min Yang
|
Ruifeng Xu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Identifying users’ stances regarding specific targets/topics is a significant route to learning public opinion from social media platforms. Most existing studies of stance detection strive to learn stance information about specific targets from the context, in order to determine the user’s stance on the target. However, in real-world scenarios, we usually have a certain understanding of a target when we express our stance on it. In this paper, we investigate stance detection from a novel perspective, where the background knowledge of the targets is taken into account for better stance detection. To be specific, we categorize background knowledge into two categories: episodic knowledge and discourse knowledge, and propose a novel Knowledge-Augmented Stance Detection (KASD) framework. For episodic knowledge, we devise a heuristic retrieval algorithm based on the topic to retrieve the Wikipedia documents relevant to the sample. Further, we construct a prompt for ChatGPT to filter the Wikipedia documents to derive episodic knowledge. For discourse knowledge, we construct a prompt for ChatGPT to paraphrase the hashtags, references, etc., in the sample, thereby injecting discourse knowledge into the sample. Experimental results on four benchmark datasets demonstrate that our KASD achieves state-of-the-art performance in in-target and zero-shot stance detection.