Qian Chen


pdf bib
DopplerBAS: Binaural Audio Synthesis Addressing Doppler Effect
Jinglin Liu | Zhenhui Ye | Qian Chen | Siqi Zheng | Wen Wang | Zhang Qinglin | Zhou Zhao
Findings of the Association for Computational Linguistics: ACL 2023

Recently, binaural audio synthesis (BAS) has emerged as a promising research field for its applications in augmented and virtual realities. Binaural audio helps ususers orient themselves and establish immersion by providing the brain with interaural time differences reflecting spatial information. However, existing BAS methods are limited in terms of phase estimation, which is crucial for spatial hearing. In this paper, we propose the DopplerBAS method to explicitly address the Doppler effect of the moving sound source. Specifically, we calculate the radial relative velocity of the moving speaker in spherical coordinates, which further guides the synthesis of binaural audio. This simple method introduces no additional hyper-parameters and does not modify the loss functions, and is plug-and-play: it scales well to different types of backbones. DopperBAS distinctly improves the representative WarpNet and BinauralGrad backbones in the phase error metric and reaches a new state of the art (SOTA): 0.780 (versus the current SOTA 0.807). Experiments and ablation studies demonstrate the effectiveness of our method.

pdf bib
Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization
Luyao Cheng | Siqi Zheng | Zhang Qinglin | Hui Wang | Yafeng Chen | Qian Chen
Findings of the Association for Computational Linguistics: ACL 2023

Speaker diarization is a classic task in speech processing and is crucial in multi-party scenarios such as meetings and conversations. Current mainstream speaker diarization approaches consider acoustic information only, which result in performance degradation when encountering adverse acoustic environment. In this paper, we propose methods to extract speaker-related information from semantic content in multi-party meetings, which, as we will show, can further benefit speaker diarization. We introduce two sub-tasks, Dialogue Detection and Speaker-Turn Detection, in which we effectively extract speaker information from conversational semantics. We also propose a simple yet effective algorithm to jointly model acoustic and semantic information and obtain speaker-identified texts. Experiments on both AISHELL-4 and AliMeeting datasets show that our method achieves consistent improvements over acoustic-only speaker diarization systems.

pdf bib
CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation
Weixiang Yan | Yuchen Tian | Yunzhe Li | Qian Chen | Wen Wang
Findings of the Association for Computational Linguistics: EMNLP 2023

Recent code translation techniques exploit neural machine translation models to translate source code from one programming language to another to satisfy production compatibility or to improve efficiency of codebase maintenance. Most existing code translation datasets only focus on a single pair of popular programming languages. To advance research on code translation and meet diverse requirements of real-world applications, we construct **CodeTransOcean**, a large-scale comprehensive benchmark that supports the largest variety of programming languages for code translation. CodeTransOcean consists of three novel multilingual datasets, namely, **MultilingualTrans** supporting translations between multiple popular programming languages, **NicheTrans** for translating between niche programming languages and popular ones, and **LLMTrans** for evaluating executability of translated code by large language models (LLMs). CodeTransOcean also includes a novel cross-framework dataset, **DLTrans**, for translating deep learning code across different frameworks. We develop multilingual modeling approaches for code translation and demonstrate their great potential in improving the translation quality of both low-resource and high-resource language pairs and boosting the training efficiency. We also propose a novel evaluation metric **Debugging Success Rate@K** for program-level code translation. Last but not least, we evaluate LLM ChatGPT on our datasets and investigate its potential for fuzzy execution predictions. We build baselines for CodeTransOcean and analyze challenges of code translation for guiding future research. The CodeTransOcean datasets and code are publicly available at https://github.com/WeixiangYAN/CodeTransOcean.

pdf bib
Improving Long Document Topic Segmentation Models With Enhanced Coherence Modeling
Hai Yu | Chong Deng | Qinglin Zhang | Jiaqing Liu | Qian Chen | Wen Wang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Topic segmentation is critical for obtaining structured documents and improving down- stream tasks such as information retrieval. Due to its ability of automatically exploring clues of topic shift from abundant labeled data, recent supervised neural models have greatly promoted the development of long document topic segmentation, but leaving the deeper relationship between coherence and topic segmentation underexplored. Therefore, this paper enhances the ability of supervised models to capture coherence from both logical structure and semantic similarity perspectives to further improve the topic segmentation performance, proposing Topic-aware Sentence Structure Prediction (TSSP) and Contrastive Semantic Similarity Learning (CSSL). Specifically, the TSSP task is proposed to force the model to comprehend structural information by learning the original relations between adjacent sentences in a disarrayed document, which is constructed by jointly disrupting the original document at topic and sentence levels. Moreover, we utilize inter- and intra-topic information to construct contrastive samples and design the CSSL objective to ensure that the sentences representations in the same topic have higher similarity, while those in different topics are less similar. Extensive experiments show that the Longformer with our approach significantly outperforms old state-of-the-art (SOTA) methods. Our approach improve F1 of old SOTA by 3.42 (73.74 77.16) and reduces Pk by 1.11 points (15.0 13.89) on WIKI-727K and achieves an average relative reduction of 4.3% on Pk on WikiSection. The average relative Pk drop of 8.38% on two out-of-domain datasets also demonstrates the robustness of our approach.

pdf bib
Ditto: A Simple and Efficient Approach to Improve Sentence Embeddings
Qian Chen | Wen Wang | Qinglin Zhang | Siqi Zheng | Chong Deng | Hai Yu | Jiaqing Liu | Yukun Ma | Chong Zhang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Prior studies diagnose the anisotropy problem in sentence representations from pre-trained language models, e.g., BERT, without fine-tuning. Our analysis reveals that the sentence embeddings from BERT suffer from a bias towards uninformative words, limiting the performance in semantic textual similarity (STS) tasks. To address this bias, we propose a simple and efficient unsupervised approach, Diagonal Attention Pooling (Ditto), which weights words with model-based importance estimations and computes the weighted average of word representations from pre-trained models as sentence embeddings. Ditto can be easily applied to any pre-trained language model as a postprocessing operation. Compared to prior sentence embedding approaches, Ditto does not add parameters nor requires any learning. Empirical evaluations demonstrate that our proposed Ditto can alleviate the anisotropy problem and improve various pre-trained models on the STS benchmarks.

pdf bib
DePA: Improving Non-autoregressive Translation with Dependency-Aware Decoder
Jiaao Zhan | Qian Chen | Boxing Chen | Wen Wang | Yu Bai | Yang Gao
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

Non-autoregressive machine translation (NAT) models have lower translation quality than autoregressive translation (AT) models because NAT decoders do not depend on previous target tokens in the decoder input. We propose a novel and general Dependency-Aware Decoder (DePA) to enhance target dependency modeling in the decoder of fully NAT models from two perspectives: decoder self-attention and decoder input. First, we propose an autoregressive forward-backward pre-training phase before NAT training, which enables the NAT decoder to gradually learn bidirectional target dependencies for the final NAT training. Second, we transform the decoder input from the source language representation space to the target language representation space through a novel attentive transformation process, which enables the decoder to better capture target dependencies. DePA can be applied to any fully NAT models. Extensive experiments show that DePA consistently improves highly competitive and state-of-the-art fully NAT models on widely used WMT and IWSLT benchmarks by up to 1.88 BLEU gain, while maintaining the inference latency comparable to other fully NAT models.


pdf bib
MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction
Linhan Zhang | Qian Chen | Wen Wang | Chong Deng | ShiLiang Zhang | Bing Li | Wei Wang | Xin Cao
Findings of the Association for Computational Linguistics: ACL 2022

Keyphrase extraction (KPE) automatically extracts phrases in a document that provide a concise summary of the core content, which benefits downstream information retrieval and NLP tasks. Previous state-of-the-art methods select candidate keyphrases based on the similarity between learned representations of the candidates and the document. They suffer performance degradation on long documents due to discrepancy between sequence lengths which causes mismatch between representations of keyphrase candidates and the document. In this work, we propose a novel unsupervised embedding-based KPE approach, Masked Document Embedding Rank (MDERank), to address this problem by leveraging a mask strategy and ranking candidates by the similarity between embeddings of the source document and the masked document. We further develop a KPE-oriented BERT (KPEBERT) model by proposing a novel self-supervised contrastive learning method, which is more compatible to MDERank than vanilla BERT. Comprehensive evaluations on six KPE benchmarks demonstrate that the proposed MDERank outperforms state-of-the-art unsupervised KPE approach by average 1.80 F1@15 improvement. MDERank further benefits from KPEBERT and overall achieves average 3.53 F1@15 improvement over SIFRank.


pdf bib
基于迭代信息传递和滑动窗口注意力的问题生成模型研究(Question Generation Model Based on Iterative Message Passing and Sliding Windows Hierarchical Attention)
Qian Chen (陈千) | Xiaoying Gao (高晓影) | Suge Wang (王素格) | Xin Guo (郭鑫)
Proceedings of the 20th Chinese National Conference on Computational Linguistics



pdf bib
T3: Tree-Autoencoder Constrained Adversarial Text Generation for Targeted Attack
Boxin Wang | Hengzhi Pei | Boyuan Pan | Qian Chen | Shuohang Wang | Bo Li
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Adversarial attacks against natural language processing systems, which perform seemingly innocuous modifications to inputs, can induce arbitrary mistakes to the target models. Though raised great concerns, such adversarial attacks can be leveraged to estimate the robustness of NLP models. Compared with the adversarial example generation in continuous data domain (e.g., image), generating adversarial text that preserves the original meaning is challenging since the text space is discrete and non-differentiable. To handle these challenges, we propose a target-controllable adversarial attack framework T3, which is applicable to a range of NLP tasks. In particular, we propose a tree-based autoencoder to embed the discrete text data into a continuous representation space, upon which we optimize the adversarial perturbation. A novel tree-based decoder is then applied to regularize the syntactic correctness of the generated text and manipulate it on either sentence (T3(Sent)) or word (T3(Word)) level. We consider two most representative NLP tasks: sentiment analysis and question answering (QA). Extensive experimental results and human studies show that T3 generated adversarial texts can successfully manipulate the NLP models to output the targeted incorrect answer without misleading the human. Moreover, we show that the generated adversarial texts have high transferability which enables the black-box attacks in practice. Our work sheds light on an effective and general way to examine the robustness of NLP models. Our code is publicly available at https://github.com/AI-secure/T3/.


pdf bib
Neural Natural Language Inference Models Enhanced with External Knowledge
Qian Chen | Xiaodan Zhu | Zhen-Hua Ling | Diana Inkpen | Si Wei
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Modeling natural language inference is a very challenging task. With the availability of large annotated data, it has recently become feasible to train complex models such as neural-network-based inference models, which have shown to achieve the state-of-the-art performance. Although there exist relatively large annotated data, can machines learn all knowledge needed to perform natural language inference (NLI) from these data? If not, how can neural-network-based NLI models benefit from external knowledge and how to build NLI models to leverage it? In this paper, we enrich the state-of-the-art neural natural language inference models with external knowledge. We demonstrate that the proposed models improve neural NLI models to achieve the state-of-the-art performance on the SNLI and MultiNLI datasets.

pdf bib
Enhancing Sentence Embedding with Generalized Pooling
Qian Chen | Zhen-Hua Ling | Xiaodan Zhu
Proceedings of the 27th International Conference on Computational Linguistics

Pooling is an essential component of a wide variety of sentence representation and embedding models. This paper explores generalized pooling methods to enhance sentence embedding. We propose vector-based multi-head attention that includes the widely used max pooling, mean pooling, and scalar self-attention as special cases. The model benefits from properly designed penalization terms to reduce redundancy in multi-head attention. We evaluate the proposed model on three different tasks: natural language inference (NLI), author profiling, and sentiment classification. The experiments show that the proposed model achieves significant improvement over strong sentence-encoding-based methods, resulting in state-of-the-art performances on four datasets. The proposed approach can be easily implemented for more problems than we discuss in this paper.


pdf bib
Enhanced LSTM for Natural Language Inference
Qian Chen | Xiaodan Zhu | Zhen-Hua Ling | Si Wei | Hui Jiang | Diana Inkpen
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Reasoning and inference are central to human and artificial intelligence. Modeling inference in human language is very challenging. With the availability of large annotated data (Bowman et al., 2015), it has recently become feasible to train neural network based inference models, which have shown to be very effective. In this paper, we present a new state-of-the-art result, achieving the accuracy of 88.6% on the Stanford Natural Language Inference Dataset. Unlike the previous top models that use very complicated network architectures, we first demonstrate that carefully designing sequential inference models based on chain LSTMs can outperform all previous models. Based on this, we further show that by explicitly considering recursive architectures in both local inference modeling and inference composition, we achieve additional improvement. Particularly, incorporating syntactic parsing information contributes to our best result—it further improves the performance even when added to the already very strong model.

pdf bib
Recurrent Neural Network-Based Sentence Encoder with Gated Attention for Natural Language Inference
Qian Chen | Xiaodan Zhu | Zhen-Hua Ling | Si Wei | Hui Jiang | Diana Inkpen
Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP

The RepEval 2017 Shared Task aims to evaluate natural language understanding models for sentence representation, in which a sentence is represented as a fixed-length vector with neural networks and the quality of the representation is tested with a natural language inference task. This paper describes our system (alpha) that is ranked among the top in the Shared Task, on both the in-domain test set (obtaining a 74.9% accuracy) and on the cross-domain test set (also attaining a 74.9% accuracy), demonstrating that the model generalizes well to the cross-domain data. Our model is equipped with intra-sentence gated-attention composition which helps achieve a better performance. In addition to submitting our model to the Shared Task, we have also tested it on the Stanford Natural Language Inference (SNLI) dataset. We obtain an accuracy of 85.5%, which is the best reported result on SNLI when cross-sentence attention is not allowed, the same condition enforced in RepEval 2017.


pdf bib
Revisiting Word Embedding for Contrasting Meaning
Zhigang Chen | Wei Lin | Qian Chen | Xiaoping Chen | Si Wei | Hui Jiang | Xiaodan Zhu
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)