Jun Liu


2024

pdf bib
QGEval: Benchmarking Multi-dimensional Evaluation for Question Generation
Weiping Fu | Bifan Wei | Jianxiang Hu | Zhongmin Cai | Jun Liu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Automatically generated questions often suffer from problems such as unclear expression or factual inaccuracies, requiring a reliable and comprehensive evaluation of their quality. Human evaluation is widely used in the field of question generation (QG) and serves as the gold standard for automatic metrics. However, there is a lack of unified human evaluation criteria, which hampers consistent and reliable evaluations of both QG models and automatic metrics. To address this, we propose **QGEval**, a multi-dimensional **Eval**uation benchmark for **Q**uestion **G**eneration, which evaluates both generated questions and existing automatic metrics across 7 dimensions: fluency, clarity, conciseness, relevance, consistency, answerability, and answer consistency. We demonstrate the appropriateness of these dimensions by examining their correlations and distinctions. Through consistent evaluations of QG models and automatic metrics with QGEval, we find that 1) most QG models perform unsatisfactorily in terms of answerability and answer consistency, and 2) existing metrics fail to align well with human judgments when evaluating generated questions across the 7 dimensions. We expect this work to foster the development of both QG technologies and their evaluation.

pdf bib
Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models
Fangzhi Xu | Zhiyong Wu | Qiushi Sun | Siyu Ren | Fei Yuan | Shuai Yuan | Qika Lin | Yu Qiao | Jun Liu
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Although Large Language Models (LLMs) demonstrate remarkable ability in processing and generating human-like text, they do have limitations when it comes to comprehending and expressing world knowledge that extends beyond the boundaries of natural language(e.g., chemical molecular formula). Injecting a collection of symbolic data directly into the training of LLMs can be problematic, as it disregards the synergies among different symbolic families and overlooks the need for a balanced mixture of natural and symbolic data. In this work, we tackle these challenges from both a data and framework perspective and introduce Symbol-LLM series models. First, we curated a data collection consisting of 34 tasks and incorporating 20 distinct symbolic families, intending to capture the interrelations and foster synergies between symbols. Then, a two-stage tuning framework succeeds in injecting symbolic knowledge without loss of the generality ability. Extensive experiments on both symbol- and NL-centric tasks demonstrate the balanced and superior performances of Symbol-LLM series models.

pdf bib
When Phrases Meet Probabilities: Enabling Open Relation Extraction with Cooperating Large Language Models
Jiaxin Wang | Lingling Zhang | Wee Sun Lee | Yujie Zhong | Liwei Kang | Jun Liu
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Current clustering-based open relation extraction (OpenRE) methods usually apply clustering algorithms on top of pre-trained language models. However, this practice has three drawbacks. First, embeddings from language models are high-dimensional and anisotropic, so using simple metrics to calculate distances between these embeddings may not accurately reflect the relational similarity. Second, there exists a gap between the pre-trained language models and downstream clustering for their different objective forms. Third, clustering with embeddings deviates from the primary aim of relation extraction, as it does not directly obtain relations. In this work, we propose a new idea for OpenRE in the era of LLMs, that is, extracting relational phrases and directly exploiting the knowledge in LLMs to assess the semantic similarity between phrases without relying on any additional metrics. Based on this idea, we developed a framework, oreLLM, that makes two LLMs work collaboratively to achieve clustering and address the above issues. Experimental results on different datasets show that oreLLM outperforms current baselines by 1.4%∼ 3.13% in terms of clustering accuracy.

pdf bib
PathReasoner: Modeling Reasoning Path with Equivalent Extension for Logical Question Answering
Fangzhi Xu | Qika Lin | Tianzhe Zhao | JiaweiHan JiaweiHan | Jun Liu
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Logical reasoning task has attracted great interest since it was proposed. Faced with such a task, current competitive models, even large language models (e.g., ChatGPT and PaLM 2), still perform badly. Previous promising LMs struggle in logical consistency modeling and logical structure perception. To this end, we model the logical reasoning task by transforming each logical sample into reasoning paths and propose an architecture PathReasoner. It addresses the task from the views of both data and model. To expand the diversity of the logical samples, we propose an atom extension strategy supported by equivalent logical formulas, to form new reasoning paths. From the model perspective, we design a stack of transformer-style blocks. In particular, we propose a path-attention module to joint model in-atom and cross-atom relations with the high-order diffusion strategy. Experiments show that PathReasoner achieves competitive performances on two logical reasoning benchmarks and great generalization abilities.

pdf bib
A Semantic Mention Graph Augmented Model for Document-Level Event Argument Extraction
Jian Zhang | Changlin Yang | Haiping Zhu | Qika Lin | Fangzhi Xu | Jun Liu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Document-level Event Argument Extraction (DEAE) aims to identify arguments and their specific roles from an unstructured document. The advanced approaches on DEAE utilize prompt-based methods to guide pre-trained language models (PLMs) in extracting arguments from input documents. They mainly concentrate on establishing relations between triggers and entity mentions within documents, leaving two unresolved problems: a) independent modeling of entity mentions; b) document-prompt isolation. To this end, we propose a semantic mention Graph Augmented Model (GAM) to address these two problems in this paper. Firstly, GAM constructs a semantic mention graph that captures relations within and between documents and prompts, encompassing co-existence, co-reference and co-type relations. Furthermore, we introduce an ensemble graph transformer module to address mentions and their three semantic relations effectively. Later, the graph-augmented encoder-decoder module incorporates the relation-specific graph into the input embedding of PLMs and optimizes the encoder section with topology information, enhancing the relations comprehensively. Extensive experiments on the RAMS and WikiEvents datasets demonstrate the effectiveness of our approach, surpassing baseline methods and achieving a new state-of-the-art performance.

2023

pdf bib
TECHS: Temporal Logical Graph Networks for Explainable Extrapolation Reasoning
Qika Lin | Jun Liu | Rui Mao | Fangzhi Xu | Erik Cambria
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Extrapolation reasoning on temporal knowledge graphs (TKGs) aims to forecast future facts based on past counterparts. There are two main challenges: (1) incorporating the complex information, including structural dependencies, temporal dynamics, and hidden logical rules; (2) implementing differentiable logical rule learning and reasoning for explainability. To this end, we propose an explainable extrapolation reasoning framework TEemporal logiCal grapH networkS (TECHS), which mainly contains a temporal graph encoder and a logical decoder. The former employs a graph convolutional network with temporal encoding and heterogeneous attention to embed topological structures and temporal dynamics. The latter integrates propositional reasoning and first-order reasoning by introducing a reasoning graph that iteratively expands to find the answer. A forward message-passing mechanism is also proposed to update node representations, and their propositional and first-order attention scores. Experimental results demonstrate that it outperforms state-of-the-art baselines.

pdf bib
Synthesize, Prompt and Transfer: Zero-shot Conversational Question Generation with Pre-trained Language Model
Hongwei Zeng | Bifan Wei | Jun Liu | Weiping Fu
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Conversational question generation aims to generate questions that depend on both context and conversation history. Conventional works utilizing deep learning have shown promising results, but heavily rely on the availability of large-scale annotated conversations. In this paper, we introduce a more realistic and less explored setting, Zero-shot Conversational Question Generation (ZeroCQG), which requires no human-labeled conversations for training. To solve ZeroCQG, we propose a multi-stage knowledge transfer framework, Synthesize, Prompt, and trAnsfer with pRe-Trained lAnguage model (SPARTA) to effectively leverage knowledge from single-turn question generation instances. To validate the zero-shot performance of SPARTA, we conduct extensive experiments on three conversational datasets: CoQA, QuAC, and DoQA by transferring knowledge from three single-turn datasets: MS MARCO, NewsQA, and SQuAD. The experimental results demonstrate the superior performance of our method. Specifically, SPARTA has achieved 14.81 BLEU-4 (88.2% absolute improvement compared to T5) in CoQA with knowledge transferred from SQuAD.

pdf bib
Enhancing Multilingual Document-Grounded Dialogue Using Cascaded Prompt-Based Post-Training Models
Jun Liu | Shuang Cheng | Zineng Zhou | Yang Gu | Jian Ye | Haiyong Luo
Proceedings of the Third DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering

The Dialdoc23 shared task presents a Multilingual Document-Grounded Dialogue Systems (MDGDS) challenge, where system responses are generated in multiple languages using user’s queries, historical dialogue records and relevant passages. A major challenge for this task is the limited training data available in low-resource languages such as French and Vietnamese. In this paper, we propose Cascaded Prompt-based Post-training Models, dividing the task into three subtasks: Retrieval, Reranking and Generation. We conduct post-training on high-resource language such as English and Chinese to enhance performance of low-resource languages by using the similarities of languages. Additionally, we utilize the prompt method to activate model’s ability on diverse languages within the dialogue domain and explore which prompt is a good prompt. Our comprehensive experiments demonstrate the effectiveness of our proposed methods, which achieved the first place on the leaderboard with a total score of 215.40 in token-level F1, SacreBleu, and Rouge-L metrics.

2022

pdf bib
Inductive Relation Prediction with Logical Reasoning Using Contrastive Representations
Yudai Pan | Jun Liu | Lingling Zhang | Tianzhe Zhao | Qika Lin | Xin Hu | Qianying Wang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Relation prediction in knowledge graphs (KGs) aims at predicting missing relations in incomplete triples, whereas the dominant embedding paradigm has a restriction on handling unseen entities during testing. In the real-world scenario, the inductive setting is more common because entities in the training process are finite. Previous methods capture an inductive ability by implicit logic in KGs. However, it would be challenging to preciously acquire entity-independent relational semantics of compositional logic rules and to deal with the deficient supervision of logic caused by the scarcity of relational semantics. To this end, we propose a novel graph convolutional network (GCN)-based model LogCo with logical reasoning by contrastive representations. LogCo firstly extracts enclosing subgraphs and relational paths between two entities to supply the entity-independence. Then a contrastive strategy for relational path instances and the subgraph is proposed for the issue of deficient supervision. The contrastive representations are learned for a joint training regime. Finally, prediction results and logic rules for reasoning are attained. Comprehensive experiments on twelve inductive datasets show that LogCo achieves outstanding performance comparing with state-of-the-art inductive relation prediction baselines.

pdf bib
MatchPrompt: Prompt-based Open Relation Extraction with Semantic Consistency Guided Clustering
Jiaxin Wang | Lingling Zhang | Jun Liu | Xi Liang | Yujie Zhong | Yaqiang Wu
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Relation clustering is a general approach for open relation extraction (OpenRE). Current methods have two major problems. One is that their good performance relies on large amounts of labeled and pre-defined relational instances for pre-training, which are costly to acquire in reality. The other is that they only focus on learning a high-dimensional metric space to measure the similarity of novel relations and ignore the specific relational representations of clusters. In this work, we propose a new prompt-based framework named MatchPrompt, which can realize OpenRE with efficient knowledge transfer from only a few pre-defined relational instances as well as mine the specific meanings for cluster interpretability. To our best knowledge, we are the first to introduce a prompt-based framework for unlabeled clustering. Experimental results on different datasets show that MatchPrompt achieves the new SOTA results for OpenRE.

2021

pdf bib
Analyzing the Forgetting Problem in Pretrain-Finetuning of Open-domain Dialogue Response Models
Tianxing He | Jun Liu | Kyunghyun Cho | Myle Ott | Bing Liu | James Glass | Fuchun Peng
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

In this work, we study how the finetuning stage in the pretrain-finetune framework changes the behavior of a pretrained neural language generator. We focus on the transformer encoder-decoder model for the open-domain dialogue response generation task. Our major finding is that after standard finetuning, the model forgets some of the important language generation skills acquired during large-scale pretraining. We demonstrate the forgetting phenomenon through a set of detailed behavior analysis from the perspectives of knowledge transfer, context sensitivity, and function space projection. As a preliminary attempt to alleviate the forgetting problem, we propose an intuitive finetuning strategy named “mix-review”. We find that mix-review effectively regularizes the finetuning process, and the forgetting problem is alleviated to some extent. Finally, we discuss interesting behavior of the resulting dialogue model and its implications.

2018

pdf bib
Automatic Error Correction on Japanese Functional Expressions Using Character-based Neural Machine Translation
Jun Liu | Fei Cheng | Yiran Wang | Hiroyuki Shindo | Yuji Matsumoto
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation

pdf bib
Sentence Suggestion of Japanese Functional Expressions for Chinese-speaking Learners
Jun Liu | Hiroyuki Shindo | Yuji Matsumoto
Proceedings of ACL 2018, System Demonstrations

We present a computer-assisted learning system, Jastudy, which is particularly designed for Chinese-speaking learners of Japanese as a second language (JSL) to learn Japanese functional expressions with suggestion of appropriate example sentences. The system automatically recognizes Japanese functional expressions using a free Japanese morphological analyzer MeCab, which is retrained on a new Conditional Random Fields (CRF) model. In order to select appropriate example sentences, we apply a pairwise-based machine learning tool, Support Vector Machine for Ranking (SVMrank) to estimate the complexity of the example sentences using Japanese–Chinese homographs as an important feature. In addition, we cluster the example sentences that contain Japanese functional expressions with two or more meanings and usages, based on part-of-speech, conjugation forms of verbs and semantic attributes, using the K-means clustering algorithm in Scikit-Learn. Experimental results demonstrate the effectiveness of our approach.

2017

pdf bib
Sentence Complexity Estimation for Chinese-speaking Learners of Japanese
Jun Liu | Yuji Matsumoto
Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation

2016

pdf bib
Simplification of Example Sentences for Learners of Japanese Functional Expressions
Jun Liu | Yuji Matsumoto
Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA2016)

Learning functional expressions is one of the difficulties for language learners, since functional expressions tend to have multiple meanings and complicated usages in various situations. In this paper, we report an experiment of simplifying example sentences of Japanese functional expressions especially for Chinese-speaking learners. For this purpose, we developed “Japanese Functional Expressions List” and “Simple Japanese Replacement List”. To evaluate the method, we conduct a small-scale experiment with Chinese-speaking learners on the effectiveness of the simplified example sentences. The experimental results indicate that simplified sentences are helpful in learning Japanese functional expressions.

2010

pdf bib
CMDMC: A Diachronic Digital Museum of Chinese Mandarin
Min Hou | Yu Zou | Yonglin Teng | Wei He | Yan Wang | Jun Liu | Jiyuan Wu
CIPS-SIGHAN Joint Conference on Chinese Language Processing