Yunfang Wu

2025

pdf bib abs
Beyond Demonstrations: Dynamic Vector Construction from Latent Representations
Wang Cai | Hsiu-Yuan Huang | Zhixiang Wang | Yunfang Wu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

In-Context derived Vector (ICV) methods extract task-relevant representations from large language models (LLMs) and reinject them during inference, achieving comparable performance to few-shot In-Context Learning (ICL) without repeated demonstration processing. However, existing ICV methods remain sensitive to ICL-specific factors, often use coarse or semantically fragmented representations as the source of the vector, and rely on heuristic-based injection positions, limiting their applicability.To address these issues, we propose Dynamic Vector (DyVec), which incorporates an Exhaustive Query Rotation (EQR) strategy to extract robust semantically aggregated latent representations by mitigating variance introduced by ICL. It then applies Dynamic Latent Segmentation and Injection to adaptively partition representations based on task complexity and leverages REINFORCE-based optimization to learn optimal injection positions for each segment.Experiments results show that DyVec outperforms few-shot ICL, LoRA, and prior ICV baselines. Further analysis highlights the effectiveness of dynamically segmenting and injecting semantically aggregated latent representations. DyVec provides a lightweight and data-efficient solution for inference-time task adaptation.

pdf bib abs
Composable Cross-prompt Essay Scoring by Merging Models
Sanwoo Lee | Kun Liang | Yunfang Wu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Recent advances in cross-prompt automated essay scoring typically train models jointly on all available source domains, often requiring simultaneous access to unlabeled target domain samples. However, using all sources can lead to suboptimal transfer and high computational cost. Moreover, repeatedly accessing the source essays for continual adaptation raises privacy concerns. We propose a source-free adaptation approach that selectively merges the parameters of individually trained source models without further access to the source datasets. In particular, we mix the task vectors—the parameter updates from fine-tuning—via a weighted sum to efficiently simulate selective joint-training. We use Bayesian optimization to determine the mixing weights using our proposed Prior-encoded Information Maximization (PIM), an unsupervised objective which promotes score discriminability by leveraging useful priors pre-computed from the sources. Experimental results with LLMs on in-dataset and cross-dataset adaptation show that our method (1) consistently outperforms joint-training on all sources, (2) maintains superior robustness compared to other merging methods, (3) excels under severe distribution shifts where recent leading cross-prompt methods struggle, all while retaining computational efficiency.

Generating effective query suggestions in conversational search requires aligning model outputs with user click preferences. However, directly optimizing for these preferences is difficult because click signals are sparse and inherently noisy. To address this, we propose Generative Query Suggestion (GQS), a generative framework that leverages click modeling to denoise implicit feedback and enables reliable preference optimization for improving real-world user engagement.GQS consists of three key components: (1) a Multi-Source CTR Modeling module that captures diverse contextual signals to estimate fine-grained click-through rates, thereby constructing more reliable user click-preference pairs; (2) a Diversity-Aware Preference Alignment strategy using CTR-weighted Direct Preference Optimization (DPO), which balances relevance and semantic diversity; and (3) a CTR-Calibrated Iterative Optimization process that jointly refines both the CTR model and the query suggestion model across training rounds, enabling effective data reuse.Experiments on two real-world tasks demonstrate that GQS outperforms strong baselines in CTR, relevance, and diversity.

pdf bib abs
3DM: Distill, Dynamic Drop, and Merge for Debiasing Multi-modal Large Language Models
Zhaoxi Zhang | Sanwoo Lee | Zhixiang Wang | Yunfang Wu
Findings of the Association for Computational Linguistics: ACL 2025

The rapid advancement of Multi-modal Language Models (MLLMs) has significantly enhanced performance in multimodal tasks, yet these models often exhibit inherent biases that compromise their reliability and fairness. Traditional debiasing methods face a trade-off between the need for extensive labeled datasets and high computational costs. Model merging, which efficiently combines multiple models into a single one, offers a promising alternative but its usage is limited to MLLMs with the same architecture. We propose 3DM, a novel framework integrating Distill, Dynamic Drop, and Merge to address these challenges. 3DM employs knowledge distillation to harmonize models with divergent architectures and introduces a dynamic dropping strategy that assigns parameter-specific drop rates based on their contributions to bias and overall performance. This approach preserves critical weights while mitigating biases, as validated on the MMSD2.0 sarcasm detection dataset. Our key contributions include architecture-agnostic merging, dynamic dropping, and the introduction of the Bias Ratio (BR) metric for systematic bias assessment. Empirical results demonstrate that 3DM outperforms existing methods in balancing debiasing and enhancing the overall performance, offering a practical and scalable solution for deploying fair and efficient MLLMs in real-world applications.

pdf bib abs
Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions
Chenming Tang | Zhixiang Wang | Hao Sun | Yunfang Wu
Findings of the Association for Computational Linguistics: EMNLP 2025

With the help of in-context learning (ICL), large language models (LLMs) have achieved impressive performance across various tasks. However, the function of descriptive instructions during ICL remains under-explored. In this work, we propose an ensemble prompt framework to describe the selection criteria of multiple in-context examples, and preliminary experiments on machine translation (MT) across six translation directions confirm that this framework boosts ICL performance. But to our surprise, LLMs might not care what the descriptions actually say, and the performance gain is primarily caused by the ensemble format, since it could lead to improvement even with random descriptive nouns. We further apply this new ensemble framework on a range of commonsense, math, logical reasoning and hallucination tasks with three LLMs and achieve promising results, suggesting again that designing a proper prompt format would be much more effective and efficient than paying effort into specific descriptions.

pdf bib abs
Beyond Spurious Signals: Debiasing Multimodal Large Language Models via Counterfactual Inference and Adaptive Expert Routing
Zichen Wu | Hsiu-Yuan Huang | Yunfang Wu
Findings of the Association for Computational Linguistics: EMNLP 2025

Multimodal Large Language Models (MLLMs) have shown substantial capabilities in integrating visual and textual information, yet frequently rely on spurious correlations, undermining their robustness and generalization in complex multimodal reasoning tasks. This paper addresses the critical challenge of superficial correlation bias in MLLMs through a novel causal mediation-based debiasing framework. Specially, we distinguishing core semantics from spurious textual and visual contexts via counterfactual examples to activate training-stage debiasing and employ a Mixture-of-Experts (MoE) architecture with dynamic routing to selectively engages modality-specific debiasing experts. Empirical evaluation on multimodal sarcasm detection and sentiment analysis tasks demonstrates that our framework significantly surpasses unimodal debiasing strategies and existing state-of-the-art models.

pdf bib abs
Dynamic Fisher-weighted Model Merging via Bayesian Optimization
Sanwoo Lee | Jiahao Liu | Qifan Wang | Jingang Wang | Xunliang Cai | Yunfang Wu
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

The fine-tuning of pre-trained language models has resulted in the widespread availability of task-specific models. Model merging offers an efficient way to create multi-task models by combining these fine-tuned models at the parameter level, without the need for training data or joint training on multiple datasets. Existing merging approaches typically involve scaling the parameters model-wise or integrating parameter importance parameter-wise. Both approaches exhibit their own weaknesses, leading to a notable performance gap compared to multi-task fine-tuning. In this paper, we unify these seemingly distinct strategies into a more general merging framework, and introduce Dynamic Fisher-weighted Merging (DF-Merge). Specifically, candidate models are associated with a set of coefficients that linearly scale their fine-tuned parameters. Bayesian optimization is applied to dynamically adjust these coefficients, aiming to maximize overall performance on validation sets. Each iteration of this process integrates parameter importance based on the Fisher information conditioned by the coefficients. Experimental results show that DF-Merge outperforms strong baselines across models of different sizes and a variety of tasks. Our analysis shows that the effectiveness of DF-Merge arises from the unified view of merging and that near-optimal performance is achievable in a few iterations, even with minimal validation data.

2024

pdf bib abs
Going Beyond Passages: Readability Assessment for Book-level Long Texts
Wenbiao Li | Rui Sun | Tianyi Zhang | Yunfang Wu
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“Readability assessment for book-level long text is widely needed in real educational applica-tions. However, most of the current researches focus on passage-level readability assessmentand little work has been done to process ultra-long texts. In order to process the long sequenceof book texts better and to enhance pretrained models with difficulty knowledge, we propose anovel model DSDR, difficulty-aware segment pre-training and difficulty multi-view representa-tion. Specifically, we split all books into multiple fixed-length segments and employ unsuper-vised clustering to obtain difficulty-aware segments, which are used to re-train the pretrainedmodel to learn difficulty knowledge. Accordingly, a long text is represented by averaging mul-tiple vectors of segments with varying difficulty levels. We construct a new dataset of GradedChildren’s Books to evaluate model performance. Our proposed model achieves promising re-sults, outperforming both the traditional SVM classifier and several popular pretrained models.In addition, our work establishes a new prototype for book-level readability assessment, whichprovides an important benchmark for related research in future work.”

pdf bib abs
SCOI: Syntax-augmented Coverage-based In-context Example Selection for Machine Translation
Chenming Tang | Zhixiang Wang | Yunfang Wu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

In-context learning (ICL) greatly improves the performance of large language models (LLMs) on various down-stream tasks, where the improvement highly depends on the quality of demonstrations. In this work, we introduce syntactic knowledge to select better in-context examples for machine translation (MT). We propose a new strategy, namely Syntax-augmented COverage-based In-context example selection (SCOI), leveraging the deep syntactic structure beyond conventional word matching. Specifically, we measure the set-level syntactic coverage by computing the coverage of polynomial terms with the help of a simplified tree-to-polynomial algorithm, and lexical coverage using word overlap. Furthermore, we devise an alternate selection approach to combine both coverage measures, taking advantage of syntactic and lexical information. We conduct experiments with two multi-lingual LLMs on six translation directions. Empirical results show that our proposed SCOI obtains the highest average COMET score among all learning-free methods, indicating that combining syntactic and lexical coverage successfully helps to select better in-context examples for MT. Our code is available at https://github.com/JamyDon/SCOI.

pdf bib abs
Unsupervised Distractor Generation via Large Language Model Distilling and Counterfactual Contrastive Decoding
Fanyi Qu | Hao Sun | Yunfang Wu
Findings of the Association for Computational Linguistics: ACL 2024

Within the context of reading comprehension, the task of Distractor Generation (DG) aims to generate several incorrect options to confuse readers. In recent years, the emergence of Large Language Models (LLMs) provides a potential for unsupervised DG without expensive human-annotated distractor labels. In this paper, we leverage LLMs as a cost-effective annotator to enhance the DG capability of smaller student models. To perform knowledge distilling, we propose a dual task training framework that integrates pseudo distractors from LLMs and answer information as the objective target with a two-stage training process. Moreover, we devise a counterfactual contrastive decoding mechanism for increasing the distracting capability of the DG model. Experiments show that our unsupervised generation method with Bart-base greatly surpasses GPT-3.5-turbo zero-shot performance with only 200× fewer model parameters. Our proposed unsupervised DG method offers a cost-effective framework for practical reading comprehension applications, without the need of laborious distractor annotation and costly large-size models.

pdf bib abs
Unleashing Large Language Models’ Proficiency in Zero-shot Essay Scoring
Sanwoo Lee | Yida Cai | Desong Meng | Ziyang Wang | Yunfang Wu
Findings of the Association for Computational Linguistics: EMNLP 2024

Advances in automated essay scoring (AES) have traditionally relied on labeled essays, requiring tremendous cost and expertise for their acquisition. Recently, large language models (LLMs) have achieved great success in various tasks, but their potential is less explored in AES. In this paper, we show that our zero-shot prompting framework, Multi Trait Specialization (MTS), elicits LLMs’ ample potential for essay scoring. In particular, we automatically decompose writing proficiency into distinct traits and generate scoring criteria for each trait. Then, an LLM is prompted to extract trait scores from several conversational rounds, each round scoring one of the traits based on the scoring criteria. Finally, we derive the overall score via trait averaging and min-max scaling. Experimental results on two benchmark datasets demonstrate that MTS consistently outperforms straightforward prompting (Vanilla) in average QWK across all LLMs and datasets, with maximum gains of 0.437 on TOEFL11 and 0.355 on ASAP. Additionally, with the help of MTS, the small-sized Llama2-13b-chat substantially outperforms ChatGPT, facilitating an effective deployment in real applications.

pdf bib abs
Mixture-of-Prompt-Experts for Multi-modal Semantic Understanding
Zichen Wu | Hsiu-Yuan Huang | Fanyi Qu | Yunfang Wu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Deep multimodal semantic understanding that goes beyond the mere superficial content relation mining has received increasing attention in the realm of artificial intelligence. The challenges of collecting and annotating high-quality multi-modal data have underscored the significance of few-shot learning. In this paper, we focus on two critical tasks under this context: few-shot multi-modal sarcasm detection (MSD) and multi-modal sentiment analysis (MSA). To address them, we propose Mixture-of-Prompt-Experts with Block-Aware Prompt Fusion (MoPE-BAF), a novel multi-modal soft prompt framework based on the unified vision-language model (VLM). Specifically, we design three experts of soft prompts: a text prompt and an image prompt that extract modality-specific features to enrich the single-modal representation, and a unified prompt to assist multi-modal interaction. Additionally, we reorganize Transformer layers into several blocks and introduce cross-modal prompt attention between adjacent blocks, which smoothens the transition from single-modal representation to multi-modal fusion. On both MSD and MSA datasets in few-shot setting, our proposed model not only surpasses the 8.2B model InstructBLIP with merely 2% parameters (150M), but also significantly outperforms other widely-used prompt methods on VLMs or task-specific methods.

pdf bib abs
Multi-modal Semantic Understanding with Contrastive Cross-modal Feature Alignment
Ming Zhang | Ke Chang | Yunfang Wu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Multi-modal semantic understanding requires integrating information from different modalities to extract users’ real intention behind words. Most previous work applies a dual-encoder structure to separately encode image and text, but fails to learn cross-modal feature alignment, making it hard to achieve cross-modal deep information interaction. This paper proposes a novel CLIP-guided contrastive-learning-based architecture to perform multi-modal feature alignment, which projects the features derived from different modalities into a unified deep space. On multi-modal sarcasm detection (MMSD) and multi-modal sentiment analysis (MMSA) tasks, the experimental results show that our proposed model significantly outperforms several baselines, and our feature alignment strategy brings obvious performance gain over models with different aggregating methods and models even enriched with knowledge. More importantly, our model is simple to implement without using task-specific external knowledge, and thus can easily migrate to other multi-modal tasks. Our source codes are available at https://github.com/ChangKe123/CLFA.

pdf bib abs
FPT: Feature Prompt Tuning for Few-shot Readability Assessment
Ziyang Wang | Sanwoo Lee | Hsiu-Yuan Huang | Yunfang Wu
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Prompt-based methods have achieved promising results in most few-shot text classification tasks. However, for readability assessment tasks, traditional prompt methods lack crucial linguistic knowledge, which has already been proven to be essential.Moreover, previous studies on utilizing linguistic features have shown non-robust performance in few-shot settings and may even impair model performance.To address these issues, we propose a novel prompt-based tuning framework that incorporates rich linguistic knowledge, called Feature Prompt Tuning (FPT). Specifically, we extract linguistic features from the text and embed them into trainable soft prompts. Further, we devise a new loss function to calibrate the similarity ranking order between categories. Experimental results demonstrate that our proposed method FTPnot only exhibits a significant performance improvement over the prior best prompt-based tuning approaches, but also surpasses the previous leading methods that incorporate linguistic features. Also, our proposed model significantly outperforms the large language model gpt-3.5-turbo-16k in most cases. Our proposed method establishes a new architecture for prompt tuning that sheds light on how linguistic features can be easily adapted to linguistic-related tasks.

pdf bib abs
Ungrammatical-syntax-based In-context Example Selection for Grammatical Error Correction
Chenming Tang | Fanyi Qu | Yunfang Wu
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

In the era of large language models (LLMs), in-context learning (ICL) stands out as an effective prompting strategy that explores LLMs’ potency across various tasks. However, applying LLMs to grammatical error correction (GEC) is still a challenging task. In this paper, we propose a novel ungrammatical-syntax-based in-context example selection strategy for GEC. Specifically, we measure similarity of sentences based on their syntactic structures with diverse algorithms, and identify optimal ICL examples sharing the most similar ill-formed syntax to the test input. Additionally, we carry out a two-stage process to further improve the quality of selection results. On benchmark English GEC datasets, empirical results show that our proposed ungrammatical-syntax-based strategies outperform commonly-used word-matching or semantics-based methods with multiple LLMs. This indicates that for a syntax-oriented task like GEC, paying more attention to syntactic information can effectively boost LLMs’ performance. Our code is available at https://github.com/JamyDon/SynICL4GEC.

2023

pdf bib abs
Are Pre-trained Language Models Useful for Model Ensemble in Chinese Grammatical Error Correction?
Chenming Tang | Xiuyu Wu | Yunfang Wu
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Model ensemble has been in widespread use for Grammatical Error Correction (GEC), boosting model performance. We hypothesize that model ensemble based on the perplexity (PPL) computed by pre-trained language models (PLMs) should benefit the GEC system. To this end, we explore several ensemble strategies based on strong PLMs with four sophisticated single models. However, the performance does not improve but even gets worse after the PLM-based ensemble. This surprising result sets us doing a detailed analysis on the data and coming up with some insights on GEC. The human references of correct sentences is far from sufficient in the test data, and the gap between a correct sentence and an idiomatic one is worth our attention. Moreover, the PLM-based ensemble strategies provide an effective way to extend and improve GEC benchmark data. Our source code is available at https://github.com/JamyDon/PLM-based-CGEC-Model-Ensemble.

2022

pdf bib abs
Position Offset Label Prediction for Grammatical Error Correction
Xiuyu Wu | Jingsong Yu | Xu Sun | Yunfang Wu
Proceedings of the 29th International Conference on Computational Linguistics

We introduce a novel position offset label prediction subtask to the encoder-decoder architecture for grammatical error correction (GEC) task. To keep the meaning of the input sentence unchanged, only a few words should be inserted or deleted during correction, and most of tokens in the erroneous sentence appear in the paired correct sentence with limited position movement. Inspired by this observation, we design an auxiliary task to predict position offset label (POL) of tokens, which is naturally capable of integrating different correction editing operations into a unified framework. Based on the predicted POL, we further propose a new copy mechanism (P-copy) to replace the vanilla copy module. Experimental results on Chinese, English and Japanese datasets demonstrate that our proposed POL-Pc framework obviously improves the performance of baseline models. Moreover, our model yields consistent performance gain over various data augmentation methods. Especially, after incorporating synthetic data, our model achieves a 38.95 F-0.5 score on Chinese GEC dataset, which outperforms the previous state-of-the-art by a wide margin of 1.98 points.

pdf bib abs
Focus-Driven Contrastive Learning for Medical Question Summarization
Ming Zhang | Shuai Dou | Ziyang Wang | Yunfang Wu
Proceedings of the 29th International Conference on Computational Linguistics

Automatic medical question summarization can significantly help the system to understand consumer health questions and retrieve correct answers. The Seq2Seq model based on maximum likelihood estimation (MLE) has been applied in this task, which faces two general problems: the model can not capture well question focus and and the traditional MLE strategy lacks the ability to understand sentence-level semantics. To alleviate these problems, we propose a novel question focus-driven contrastive learning framework (QFCL). Specially, we propose an easy and effective approach to generate hard negative samples based on the question focus, and exploit contrastive learning at both encoder and decoder to obtain better sentence level representations. On three medical benchmark datasets, our proposed model achieves new state-of-the-art results, and obtains a performance gain of 5.33, 12.85 and 3.81 points over the baseline BART model on three datasets respectively. Further human judgement and detailed analysis prove that our QFCL model learns better sentence representations with the ability to distinguish different sentence meanings, and generates high-quality summaries by capturing question focus.

pdf bib abs
Enhancing Pre-trained Models with Text Structure Knowledge for Question Generation
Zichen Wu | Xin Jia | Fanyi Qu | Yunfang Wu
Proceedings of the 29th International Conference on Computational Linguistics

Today the pre-trained language models achieve great success for question generation (QG) task and significantly outperform traditional sequence-to-sequence approaches. However, the pre-trained models treat the input passage as a flat sequence and are thus not aware of the text structure of input passage. For QG task, we model text structure as answer position and syntactic dependency, and propose answer localness modeling and syntactic mask attention to address these limitations. Specially, we present localness modeling with a Gaussian bias to enable the model to focus on answer-surrounded context, and propose a mask attention mechanism to make the syntactic structure of input passage accessible in question generation process. Experiments on SQuAD dataset show that our proposed two modules improve performance over the strong pre-trained model ProphetNet, and combing them together achieves very competitive results with the state-of-the-art pre-trained model.

pdf bib abs
A Unified Neural Network Model for Readability Assessment with Feature Projection and Length-Balanced Loss
Wenbiao Li | Wang Ziyang | Yunfang Wu
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Readability assessment is a basic research task in the field of education. Traditional methods mainly employ machine learning classifiers with hundreds of linguistic features. Although the deep learning model has become the prominent approach for almost all NLP tasks, it is less explored for readability assessment. In this paper, we propose a BERT-based model with feature projection and length-balanced loss (BERT-FP-LBL) to determine the difficulty level of a given text. First, we introduce topic features guided by difficulty knowledge to complement the traditional linguistic features. From the linguistic features, we extract really useful orthogonal features to supplement BERT representations by means of projection filtering. Furthermore, we design a length-balanced loss to handle the greatly varying length distribution of the readability data. We conduct experiments on three English benchmark datasets and one Chinese dataset, and the experimental results show that our proposed model achieves significant improvements over baseline models. Interestingly, our proposed model achieves comparable results with human experts in consistency test.

pdf bib abs
From Spelling to Grammar: A New Framework for Chinese Grammatical Error Correction
Xiuyu Wu | Yunfang Wu
Findings of the Association for Computational Linguistics: EMNLP 2022

Chinese Grammatical Error Correction (CGEC) aims to generate a correct sentence from an erroneous sequence, where different kinds of errors are mixed. This paper divides the CGEC task into two steps, namely spelling error correction and grammatical error correction. We firstly propose a novel zero-shot approach for spelling error correction, which is simple but effective, obtaining a high precision to avoid error accumulation of the pipeline structure. To handle grammatical error correction, we design part-of-speech (POS) features and semantic class features to enhance the neural network model, and propose an auxiliary task to predict the POS sequence of the target sentence. Our proposed framework achieves a 42.11 F-0.5 score on CGEC dataset without using any synthetic data or data augmentation methods, which outperforms the previous state-of-the-art by a wide margin of 1.30 points. Moreover, our model produces meaningful POS representations that capture different POS words and convey reasonable POS transition rules.

pdf bib abs
An Error-Guided Correction Model for Chinese Spelling Error Correction
Rui Sun | Xiuyu Wu | Yunfang Wu
Findings of the Association for Computational Linguistics: EMNLP 2022

Although existing neural network approaches have achieved great progress on Chinese spelling correction, there is still room to improve. The model is required to avoid over-correction and to distinguish a correct token from its phonological and visual similar ones. In this paper, we propose an error-guided correction model to address these issues. By borrowing the powerful ability of the pre-trained BERT model, we propose a novel zero-shot error detection method to do a preliminary detection, which guides our model to attend more on the probably wrong tokens in encoding and to avoid modifying the correct tokens in generating. Furthermore, we introduce a new loss function to integrate the error confusion set, which enables our model to distinguish similar tokens. Moreover, our model supports highly parallel decoding to meet real applications. Experiments are conducted on widely used benchmarks. Our model achieves superior performance against state-of-the-art approaches by a remarkable margin, on both the quality and computation speed.

2021

pdf bib abs
Asking Questions Like Educational Experts: Automatically Generating Question-Answer Pairs on Real-World Examination Data
Fanyi Qu | Xin Jia | Yunfang Wu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Generating high quality question-answer pairs is a hard but meaningful task. Although previous works have achieved great results on answer-aware question generation, it is difficult to apply them into practical application in the education field. This paper for the first time addresses the question-answer pair generation task on the real-world examination data, and proposes a new unified framework on RACE. To capture the important information of the input passage we first automatically generate (rather than extracting) keyphrases, thus this task is reduced to keyphrase-question-answer triplet joint generation. Accordingly, we propose a multi-agent communication model to generate and optimize the question and keyphrases iteratively, and then apply the generated question and keyphrases to guide the generation of answers. To establish a solid benchmark, we build our model on the strong generative pre-training model. Experimental results show that our model makes great breakthroughs in the question-answer pair generation task. Moreover, we make a comprehensive analysis on our model, suggesting new directions for this challenging task.

2020

pdf bib abs
Exploiting WordNet Synset and Hypernym Representations for Answer Selection
Weikang Li | Yunfang Wu
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

Answer selection (AS) is an important subtask of document-based question answering (DQA). In this task, the candidate answers come from the same document, and each answer sentence is semantically related to the given question, which makes it more challenging to select the true answer. WordNet provides powerful knowledge about concepts and their semantic relations so we employ WordNet to enrich the abilities of paraphrasing and reasoning of the network-based question answering model. Specifically, we exploit the synset and hypernym concepts to enrich the word representation and incorporate the similarity scores of two concepts that share the synset or hypernym relations into the attention mechanism. The proposed WordNet-enhanced hierarchical model (WEHM) consists of four modules, including WordNet-enhanced word representation, sentence encoding, WordNet-enhanced attention mechanism, and hierarchical document encoding. Extensive experiments on the public WikiQA and SelQA datasets demonstrate that our proposed model significantly improves the baseline system and outperforms all existing state-of-the-art methods by a large margin.

pdf bib abs
How to Ask Good Questions? Try to Leverage Paraphrases
Xin Jia | Wenjie Zhou | Xu Sun | Yunfang Wu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Given a sentence and its relevant answer, how to ask good questions is a challenging task, which has many real applications. Inspired by human’s paraphrasing capability to ask questions of the same meaning but with diverse expressions, we propose to incorporate paraphrase knowledge into question generation(QG) to generate human-like questions. Specifically, we present a two-hand hybrid model leveraging a self-built paraphrase resource, which is automatically conducted by a simple back-translation method. On the one hand, we conduct multi-task learning with sentence-level paraphrase generation (PG) as an auxiliary task to supplement paraphrase knowledge to the task-share encoder. On the other hand, we adopt a new loss function for diversity training to introduce more question patterns to QG. Extensive experimental results show that our proposed model obtains obvious performance gain over several strong baselines, and further human evaluation validates that our model can ask questions of high quality by leveraging paraphrase knowledge.

pdf bib abs
A Question Type Driven and Copy Loss Enhanced Frameworkfor Answer-Agnostic Neural Question Generation
Xiuyu Wu | Nan Jiang | Yunfang Wu
Proceedings of the Fourth Workshop on Neural Generation and Translation

The answer-agnostic question generation is a significant and challenging task, which aims to automatically generate questions for a given sentence but without an answer. In this paper, we propose two new strategies to deal with this task: question type prediction and copy loss mechanism. The question type module is to predict the types of questions that should be asked, which allows our model to generate multiple types of questions for the same source sentence. The new copy loss enhances the original copy mechanism to make sure that every important word in the source sentence has been copied when generating questions. Our integrated model outperforms the state-of-the-art approach in answer-agnostic question generation, achieving a BLEU-4 score of 13.9 on SQuAD. Human evaluation further validates the high quality of our generated questions. We will make our code public available for further research.

2019

pdf bib abs
Multi-Task Learning with Language Modeling for Question Generation
Wenjie Zhou | Minghua Zhang | Yunfang Wu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

This paper explores the task of answer-aware questions generation. Based on the attention-based pointer generator model, we propose to incorporate an auxiliary task of language modeling to help question generation in a hierarchical multi-task learning structure. Our joint-learning model enables the encoder to learn a better representation of the input sequence, which will guide the decoder to generate more coherent and fluent questions. On both SQuAD and MARCO datasets, our multi-task learning model boosts the performance, achieving state-of-the-art results. Moreover, human evaluation further proves the high quality of our generated questions.

pdf bib abs
Question-type Driven Question Generation
Wenjie Zhou | Minghua Zhang | Yunfang Wu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Question generation is a challenging task which aims to ask a question based on an answer and relevant context. The existing works suffer from the mismatching between question type and answer, i.e. generating a question with type how while the answer is a personal name. We propose to automatically predict the question type based on the input answer and context. Then, the question type is fused into a seq2seq model to guide the question generation, so as to deal with the mismatching problem. We achieve significant improvement on the accuracy of question type prediction and finally obtain state-of-the-art results for question generation on both SQuAD and MARCO datasets.

Automatic article commenting is helpful in encouraging user engagement on online news platforms. However, the news documents are usually too long for models under traditional encoder-decoder frameworks, which often results in general and irrelevant comments. In this paper, we propose to generate comments with a graph-to-sequence model that models the input news as a topic interaction graph. By organizing the article into graph structure, our model can better understand the internal structure of the article and the connection between topics, which makes it better able to generate coherent and informative comments. We collect and release a large scale news-comment corpus from a popular Chinese online news platform Tencent Kuaibao. Extensive experiment results show that our model can generate much more coherent and informative comments compared with several strong baseline models.

2018

pdf bib abs
Learning Universal Sentence Representations with Mean-Max Attention Autoencoder
Minghua Zhang | Yunfang Wu | Weikang Li | Wei Li
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

In order to learn universal sentence representations, previous methods focus on complex recurrent neural networks or supervised learning. In this paper, we propose a mean-max attention autoencoder (mean-max AAE) within the encoder-decoder framework. Our autoencoder rely entirely on the MultiHead self-attention mechanism to reconstruct the input sequence. In the encoding we propose a mean-max strategy that applies both mean and max pooling operations over the hidden vectors to capture diverse information of the input. To enable the information to steer the reconstruction process dynamically, the decoder performs attention over the mean-max representation. By training our model on a large collection of unlabelled data, we obtain high-quality representations of sentences. Experimental results on a broad range of 10 transfer tasks demonstrate that our model outperforms the state-of-the-art unsupervised single methods, including the classical skip-thoughts and the advanced skip-thoughts+LN model. Furthermore, compared with the traditional recurrent neural network, our mean-max AAE greatly reduce the training time.

pdf bib
Research on Entity Relation Extraction for Military Field
Chen Liang | Hongying Zan | Yajun Liu | Yunfang Wu
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation

2016

pdf bib abs
Multi-level Gated Recurrent Neural Network for dialog act classification
Wei Li | Yunfang Wu
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

In this paper we focus on the problem of dialog act (DA) labelling. This problem has recently attracted a lot of attention as it is an important sub-part of an automatic question answering system, which is currently in great demand. Traditional methods tend to see this problem as a sequence labelling task and deals with it by applying classifiers with rich features. Most of the current neural network models still omit the sequential information in the conversation. Henceforth, we apply a novel multi-level gated recurrent neural network (GRNN) with non-textual information to predict the DA tag. Our model not only utilizes textual information, but also makes use of non-textual and contextual information. In comparison, our model has shown significant improvement over previous works on Switchboard Dialog Act (SWDA) task by over 6%.

pdf bib
ICL00 at SemEval-2016 Task 3: Translation-Based Method for CQA System
Yunfang Wu | Minghua Zhang
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

Yunfang Wu

2025

2024

2023

2022

2021

2020

2019

2018

2016

2012

2011

2010

2007

2005

Co-authors

Venues