Yue Jiang


2024

pdf bib
UrbanLLM: Autonomous Urban Activity Planning and Management with Large Language Models
Yue Jiang | Qin Chao | Yile Chen | Xiucheng Li | Shuai Liu | Gao Cong
Findings of the Association for Computational Linguistics: EMNLP 2024

Location-based services play an critical role in improving the quality of our daily lives. Despite the proliferation of numerous specialized AI models within spatio-temporal context of location-based services, these models struggle to autonomously tackle problems regarding complex urban planing and management. To bridge this gap, we introduce UrbanLLM, a fine-tuned large language model (LLM) designed to tackle diverse problems in urban scenarios. UrbanLLM functions as a problem- solver by decomposing urban-related queries into manageable sub-tasks, identifying suitable spatio-temporal AI models for each sub-task, and generating comprehensive responses to the given queries. Our experimental results indicate that UrbanLLM significantly outperforms other established LLMs, such as Llama and the GPT series, in handling problems concerning complex urban activity planning and management. UrbanLLM exhibits considerable potential in enhancing the effectiveness of solving problems in urban scenarios, reducing the workload and reliance for human experts.

pdf bib
H-LegalKI: A Hierarchical Legal Knowledge Integration Framework for Legal Community Question Answering
Yue Jiang | Ziyu Guan | Jie Zhao | Wei Zhao | Jiaqi Yang
Findings of the Association for Computational Linguistics: EMNLP 2024

Legal question answering (LQA) aims to bridge the gap between the limited availability of legal professionals and the high demand for legal assistance. Traditional LQA approaches typically either select the optimal answers from an answer set or extract answers from law texts. However, they often struggle to provide relevant answers to complex, real-world questions due to the rigidity of predetermined answers. Although recent advancements in legal large language models have shown some potential in enhancing answer relevance, they fail to address the multiple user-specific circumstances, i.e., factual details in questions. To address these issues, we (1) construct the first publicly available legal community question-answering (LegalCQA) dataset; and (2) propose a Hierarchical Legal Knowledge Integration (H-LegalKI) framework. LegalCQA is collected from two widely used legal forums for developing user-centered LQA models. For H-LegalKI, we design a legal knowledge retriever that gathers comprehensive legal knowledge based on both entire questions and individual sentences. And an answer generation model is designed to understand question- and sentence-level factual details and integrate corresponding legal knowledge in a hierarchical way. Additionally, we design a de-redundancy module to remove redundant legal knowledge. Experiments on LegalCQA demonstrate the superiority of our framework over competitive baselines.

pdf bib
SC2: Towards Enhancing Content Preservation and Style Consistency in Long Text Style Transfer
Jie Zhao | Ziyu Guan | Cai Xu | Wei Zhao | Yue Jiang
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Text style transfer (TST) aims to vary the style polarity of text while preserving the semantic content. Although recent advancements have demonstrated remarkable progress in short TST, it remains a relatively straightforward task with limited practical applications. The more comprehensive long TST task presents two challenges: (1) existing methods encounter difficulties in accurately evaluating content attributes in multiple words, leading to content degradation; (2) the conventional vanilla style classifier loss encounters obstacles in maintaining consistent style across multiple generated sentences.In this paper, we propose a novel method SC2, where a multilayer Joint Style-Content Weighed (JSCW) module and a Style Consistency loss are designed to address the two issues. The JSCW simultaneously assesses the amounts of style and content attributes within a token, aiming to acquire a lossless content representation and thereby enhancing content preservation. The multiple JSCW layers further progressively refine content representations. We design a style consistency loss to ensure the generated multiple sentences consistently reflect the target style polarity. Moreover, we incorporate a denoising non-autoregressive decoder to accelerate the training. We conduct plentiful experiments and the results show significant improvements of SC2 over competitive baselines. Our code: https://github.com/jiezhao6/SC2.

2023

pdf bib
Enhancing Cross-lingual Prompting with Dual Prompt Augmentation
Meng Zhou | Xin Li | Yue Jiang | Lidong Bing
Findings of the Association for Computational Linguistics: ACL 2023

Prompting shows promising results in few-shot scenarios. However, its strength for multilingual/cross-lingual problems has not been fully exploited. hao and Schütze (2021) made initial explorations in this direction by presenting that cross-lingual prompting outperforms cross-lingual finetuning. In this paper, we conduct an empirical exploration on the effect of each component in cross-lingual prompting and derive Universal Prompting, which helps alleviate the discrepancies between source-language training and target-language inference. Based on this, we propose DPA, a dual prompt augmentation framework, aiming at relieving the data scarcity issue in few-shot cross-lingual prompting. Notably, for XNLI, our method achieves 46.54% with only 16 English training examples per class, significantly better than 34.99% of fine-tuning. Our code is available at https://github.com/DAMO-NLP-SG/DPA.

2022

pdf bib
Using Mixed Incentives to Document Xi’an Guanzhong
Juhong Zhan | Yue Jiang | Christopher Cieri | Mark Liberman | Jiahong Yuan | Yiya Chen | Odette Scharenborg
Proceedings of the 2nd Workshop on Novel Incentives in Data Collection from People: models, implementations, challenges and results within LREC 2022

This paper describes our use of mixed incentives and the citizen science portal LanguageARC to prepare, collect and quality control a large corpus of object namings for the purpose of providing speech data to document the under-represented Guanzhong dialect of Chinese spoken in the Shaanxi province in the environs of Xi’an.