Enhong Chen


pdf bib
MRT: Multi-modal Short- and Long-range Temporal Convolutional Network for Time-sync Comment Video Behavior Prediction
Weihao Zhao | Weidong He | Hao Wang | Haoyang Bi | Han Wu | Chen Zhu | Tong Xu | Enhong Chen
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

As a fresh way to improve the user viewing experience, videos of time-sync comments have attracted a lot of interest. Many efforts have been made to explore the effectiveness of time-sync comments for various applications. However, due to the complexity of interactions among users, videos, and comments, it still remains challenging to understand users’ behavior on time-sync comments. Along this line, we study the problem of time-sync comment behavior prediction with considerations of both historical behaviors and multi-modal information of visual frames and textual comments. Specifically, we propose a novel Multi-modal short- and long-Range Temporal Convolutional Network model, namely MRT. Firstly, we design two amplified Temporal Convolutional Networks with different sizes of receptive fields, to capture both short- and long-range surrounding contexts for each frame and time-sync comments. Then, we design a bottle-neck fusion module to obtain the multi-modal enhanced representation. Furthermore, we take the user preferences into consideration to generate the personalized multi-model semantic representation at each timestamp. Finally, we utilize the binary cross-entropy loss to optimize MRT on the basis of users’ historical records. Through comparing with representative baselines, we demonstrate the effectiveness of MRT and qualitatively verify the necessity and utility of short- and long-range contextual and multi-modal information through extensive experiments.

pdf bib
Multi-perspective Improvement of Knowledge Graph Completion with Large Language Models
Derong Xu | Ziheng Zhang | Zhenxi Lin | Xian Wu | Zhihong Zhu | Tong Xu | Xiangyu Zhao | Yefeng Zheng | Enhong Chen
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Knowledge graph completion (KGC) is a widely used method to tackle incompleteness in knowledge graphs (KGs) by making predictions for missing links. Description-based KGC leverages pre-trained language models to learn entity and relation representations with their names or descriptions, which shows promising results. However, the performance of description-based KGC is still limited by the quality of text and the incomplete structure, as it lacks sufficient entity descriptions and relies solely on relation names, leading to sub-optimal results. To address this issue, we propose MPIKGC, a general framework to compensate for the deficiency of contextualized knowledge and improve KGC by querying large language models (LLMs) from various perspectives, which involves leveraging the reasoning, explanation, and summarization capabilities of LLMs to expand entity descriptions, understand relations, and extract structures, respectively. We conducted extensive evaluation of the effectiveness and improvement of our framework based on four description-based KGC models, for both link prediction and triplet classification tasks. All codes and generated data will be publicly available after review.


pdf bib
Enhancing Hierarchical Text Classification through Knowledge Graph Integration
Ye Liu | Kai Zhang | Zhenya Huang | Kehang Wang | Yanghai Zhang | Qi Liu | Enhong Chen
Findings of the Association for Computational Linguistics: ACL 2023

Hierarchical Text Classification (HTC) is an essential and challenging subtask of multi-label text classification with a taxonomic hierarchy. Recent advances in deep learning and pre-trained language models have led to significant breakthroughs in the HTC problem. However, despite their effectiveness, these methods are often restricted by a lack of domain knowledge, which leads them to make mistakes in a variety of situations. Generally, when manually classifying a specific document to the taxonomic hierarchy, experts make inference based on their prior knowledge and experience. For machines to achieve this capability, we propose a novel Knowledge-enabled Hierarchical Text Classification model (K-HTC), which incorporates knowledge graphs into HTC. Specifically, K-HTC innovatively integrates knowledge into both the text representation and hierarchical label learning process, addressing the knowledge limitations of traditional methods. Additionally, a novel knowledge-aware contrastive learning strategy is proposed to further exploit the information inherent in the data. Extensive experiments on two publicly available HTC datasets show the efficacy of our proposed method, and indicate the necessity of incorporating knowledge graphs in HTC tasks.

pdf bib
RHGN: Relation-gated Heterogeneous Graph Network for Entity Alignment in Knowledge Graphs
Xukai Liu | Kai Zhang | Ye Liu | Enhong Chen | Zhenya Huang | Linan Yue | Jiaxian Yan
Findings of the Association for Computational Linguistics: ACL 2023

Entity Alignment, which aims to identify equivalent entities from various Knowledge Graphs (KGs), is a fundamental and crucial task in knowledge graph fusion. Existing methods typically use triple or neighbor information to represent entities, and then align those entities using similarity matching. Most of them, however, fail to account for the heterogeneity among KGs and the distinction between KG entities and relations. To better solve these problems, we propose a Relation-gated Heterogeneous Graph Network (RHGN) for entity alignment. Specifically, RHGN contains a relation-gated convolutional layer to distinguish relations and entities in the KG. In addition, RHGN adopts a cross-graph embedding exchange module and a soft relation alignment module to address the neighbor heterogeneity and relation heterogeneity between different KGs, respectively. Extensive experiments on four benchmark datasets demonstrate that RHGN is superior to existing state-of-the-art entity alignment methods.

pdf bib
The MineTrans Systems for IWSLT 2023 Offline Speech Translation and Speech-to-Speech Translation Tasks
Yichao Du | Guo Zhengsheng | Jinchuan Tian | Zhirui Zhang | Xing Wang | Jianwei Yu | Zhaopeng Tu | Tong Xu | Enhong Chen
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

This paper presents the extscMineTrans English-to-Chinese speech translation systems developed for two challenge tracks of IWSLT 2023, i.e., Offline Speech Translation (S2T) and Speech-to-Speech Translation (S2ST). For the S2T track, extscMineTrans employs a practical cascaded system to explore the limits of translation performance in both constrained and unconstrained settings, where the whole system consists of automatic speech recognition (ASR), punctuation recognition (PC), and machine translation (MT) modules. We also investigate the effectiveness of multiple ASR architectures and explore two MT strategies: supervised in-domain fine-tuning and prompt-guided translation using a large language model. For the S2ST track, we explore a speech-to-unit (S2U) framework to build an end-to-end S2ST system. This system encodes the target speech as discrete units via our trained HuBERT. Then it leverages the standard sequence-to-sequence model to directly learn the mapping between source speech and discrete units without any auxiliary recognition tasks (i.e., ASR and MT tasks). Various efforts are made to improve the extscMineTrans’s performance, such as acoustic model pre-training on large-scale data, data filtering, data augmentation, speech segmentation, knowledge distillation, consistency training, model ensembles, etc.


pdf bib
Incorporating Dynamic Semantics into Pre-Trained Language Model for Aspect-based Sentiment Analysis
Kai Zhang | Kun Zhang | Mengdi Zhang | Hongke Zhao | Qi Liu | Wei Wu | Enhong Chen
Findings of the Association for Computational Linguistics: ACL 2022

Aspect-based sentiment analysis (ABSA) predicts sentiment polarity towards a specific aspect in the given sentence. While pre-trained language models such as BERT have achieved great success, incorporating dynamic semantic changes into ABSA remains challenging. To this end, in this paper, we propose to address this problem by Dynamic Re-weighting BERT (DR-BERT), a novel method designed to learn dynamic aspect-oriented semantics for ABSA. Specifically, we first take the Stack-BERT layers as a primary encoder to grasp the overall semantic of the sentence and then fine-tune it by incorporating a lightweight Dynamic Re-weighting Adapter (DRA). Note that the DRA can pay close attention to a small region of the sentences at each step and re-weigh the vitally important words for better aspect-aware sentiment understanding. Finally, experimental results on three benchmark datasets demonstrate the effectiveness and the rationality of our proposed model and provide good interpretable insights for future semantic modeling.

pdf bib
Non-Parametric Domain Adaptation for End-to-End Speech Translation
Yichao Du | Weizhi Wang | Zhirui Zhang | Boxing Chen | Tong Xu | Jun Xie | Enhong Chen
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

The end-to-end speech translation (E2E-ST) has received increasing attention due to the potential of its less error propagation, lower latency and fewer parameters. However, the effectiveness of neural-based approaches to this task is severely limited by the available training corpus, especially for domain adaptation where in-domain triplet data is scarce or nonexistent. In this paper, we propose a novel non-parametric method that leverages in-domain text translation corpus to achieve domain adaptation for E2E-ST systems. To this end, we first incorporate an additional encoder into the pre-trained E2E-ST model to realize text translation modeling, based on which the decoder’s output representations for text and speech translation tasks are unified by reducing the correspondent representation mismatch in available triplet training data. During domain adaptation, a k-nearest-neighbor (kNN) classifier is introduced to produce the final translation distribution using the external datastore built by the domain-specific text translation corpus, while the universal output representation is adopted to perform a similarity search. Experiments on the Europarl-ST benchmark demonstrate that when in-domain text translation data is involved only, our proposed approach significantly improves baseline by 12.82 BLEU on average in all translation directions, even outperforming the strong in-domain fine-tuning strategy.

pdf bib
VIRT: Improving Representation-based Text Matching via Virtual Interaction
Dan Li | Yang Yang | Hongyin Tang | Jiahao Liu | Qifan Wang | Jingang Wang | Tong Xu | Wei Wu | Enhong Chen
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Text matching is a fundamental research problem in natural language understanding. Interaction-based approaches treat the text pair as a single sequence and encode it through cross encoders, while representation-based models encode the text pair independently with siamese or dual encoders. Interaction-based models require dense computations and thus are impractical in real-world applications. Representation-based models have become the mainstream paradigm for efficient text matching. However, these models suffer from severe performance degradation due to the lack of interactions between the pair of texts. To remedy this, we propose a Virtual InteRacTion mechanism (VIRT) for improving representation-based text matching while maintaining its efficiency. In particular, we introduce an interactive knowledge distillation module that is only applied during training. It enables deep interaction between texts by effectively transferring knowledge from the interaction-based model. A light interaction strategy is designed to fully leverage the learned interactive knowledge. Experimental results on six text matching benchmarks demonstrate the superior performance of our method over several state-of-the-art representation-based models. We further show that VIRT can be integrated into existing methods as plugins to lift their performances.


pdf bib
Cross Attention Augmented Transducer Networks for Simultaneous Translation
Dan Liu | Mengge Du | Xiaoxi Li | Ya Li | Enhong Chen
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

This paper proposes a novel architecture, Cross Attention Augmented Transducer (CAAT), for simultaneous translation. The framework aims to jointly optimize the policy and translation models. To effectively consider all possible READ-WRITE simultaneous translation action paths, we adapt the online automatic speech recognition (ASR) model, RNN-T, but remove the strong monotonic constraint, which is critical for the translation task to consider reordering. To make CAAT work, we introduce a novel latency loss whose expectation can be optimized by a forward-backward algorithm. We implement CAAT with Transformer while the general CAAT architecture can also be implemented with other attention-based encoder-decoder frameworks. Experiments on both speech-to-text (S2T) and text-to-text (T2T) simultaneous translation tasks show that CAAT achieves significantly better latency-quality trade-offs compared to the state-of-the-art simultaneous translation approaches.


pdf bib
Jointly Masked Sequence-to-Sequence Model for Non-Autoregressive Neural Machine Translation
Junliang Guo | Linli Xu | Enhong Chen
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

The masked language model has received remarkable attention due to its effectiveness on various natural language processing tasks. However, few works have adopted this technique in the sequence-to-sequence models. In this work, we introduce a jointly masked sequence-to-sequence model and explore its application on non-autoregressive neural machine translation~(NAT). Specifically, we first empirically study the functionalities of the encoder and the decoder in NAT models, and find that the encoder takes a more important role than the decoder regarding the translation quality. Therefore, we propose to train the encoder more rigorously by masking the encoder input while training. As for the decoder, we propose to train it based on the consecutive masking of the decoder input with an n-gram loss function to alleviate the problem of translating duplicate words. The two types of masks are applied to the model jointly at the training stage. We conduct experiments on five benchmark machine translation tasks, and our model can achieve 27.69/32.24 BLEU scores on WMT14 English-German/German-English tasks with 5+ times speed up compared with an autoregressive model.


pdf bib
Budgeted Policy Learning for Task-Oriented Dialogue Systems
Zhirui Zhang | Xiujun Li | Jianfeng Gao | Enhong Chen
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

This paper presents a new approach that extends Deep Dyna-Q (DDQ) by incorporating a Budget-Conscious Scheduling (BCS) to best utilize a fixed, small amount of user interactions (budget) for learning task-oriented dialogue agents. BCS consists of (1) a Poisson-based global scheduler to allocate budget over different stages of training; (2) a controller to decide at each training step whether the agent is trained using real or simulated experiences; (3) a user goal sampling module to generate the experiences that are most effective for policy learning. Experiments on a movie-ticket booking task with simulated and real users show that our approach leads to significant improvements in success rate over the state-of-the-art baselines given the fixed budget.


pdf bib
Bidirectional Generative Adversarial Networks for Neural Machine Translation
Zhirui Zhang | Shujie Liu | Mu Li | Ming Zhou | Enhong Chen
Proceedings of the 22nd Conference on Computational Natural Language Learning

Generative Adversarial Network (GAN) has been proposed to tackle the exposure bias problem of Neural Machine Translation (NMT). However, the discriminator typically results in the instability of the GAN training due to the inadequate training problem: the search space is so huge that sampled translations are not sufficient for discriminator training. To address this issue and stabilize the GAN training, in this paper, we propose a novel Bidirectional Generative Adversarial Network for Neural Machine Translation (BGAN-NMT), which aims to introduce a generator model to act as the discriminator, whereby the discriminator naturally considers the entire translation space so that the inadequate training problem can be alleviated. To satisfy this property, generator and discriminator are both designed to model the joint probability of sentence pairs, with the difference that, the generator decomposes the joint probability with a source language model and a source-to-target translation model, while the discriminator is formulated as a target language model and a target-to-source translation model. To further leverage the symmetry of them, an auxiliary GAN is introduced and adopts generator and discriminator models of original one as its own discriminator and generator respectively. Two GANs are alternately trained to update the parameters. Experiment results on German-English and Chinese-English translation tasks demonstrate that our method not only stabilizes GAN training but also achieves significant improvements over baseline systems.


pdf bib
Stack-based Multi-layer Attention for Transition-based Dependency Parsing
Zhirui Zhang | Shujie Liu | Mu Li | Ming Zhou | Enhong Chen
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Although sequence-to-sequence (seq2seq) network has achieved significant success in many NLP tasks such as machine translation and text summarization, simply applying this approach to transition-based dependency parsing cannot yield a comparable performance gain as in other state-of-the-art methods, such as stack-LSTM and head selection. In this paper, we propose a stack-based multi-layer attention model for seq2seq learning to better leverage structural linguistics information. In our method, two binary vectors are used to track the decoding stack in transition-based parsing, and multi-layer attention is introduced to capture multiple word dependencies in partial trees. We conduct experiments on PTB and CTB datasets, and the results show that our proposed model achieves state-of-the-art accuracy and significant improvement in labeled precision with respect to the baseline seq2seq model.


pdf bib
Chinese Poetry Generation with Planning based Neural Network
Zhe Wang | Wei He | Hua Wu | Haiyang Wu | Wei Li | Haifeng Wang | Enhong Chen
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Chinese poetry generation is a very challenging task in natural language processing. In this paper, we propose a novel two-stage poetry generating method which first plans the sub-topics of the poem according to the user’s writing intent, and then generates each line of the poem sequentially, using a modified recurrent neural network encoder-decoder framework. The proposed planning-based method can ensure that the generated poem is coherent and semantically consistent with the user’s intent. A comprehensive evaluation with human judgments demonstrates that our proposed approach outperforms the state-of-the-art poetry generating methods and the poem quality is somehow comparable to human poets.


pdf bib
A Probabilistic Model for Learning Multi-Prototype Word Embeddings
Fei Tian | Hanjun Dai | Jiang Bian | Bin Gao | Rui Zhang | Enhong Chen | Tie-Yan Liu
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers


pdf bib
A Dataset for Research on Short-Text Conversations
Hao Wang | Zhengdong Lu | Hang Li | Enhong Chen
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing


pdf bib
Hedge Classification with Syntactic Dependency Features Based on an Ensemble Classifier
Yi Zheng | Qifeng Dai | Qiming Luo | Enhong Chen
Proceedings of the Fourteenth Conference on Computational Natural Language Learning – Shared Task


pdf bib
An Iterative Approach for Joint Dependency Parsing and Semantic Role Labeling
Qifeng Dai | Enhong Chen | Liu Shi
Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task


pdf bib
Probabilistic Model for Syntactic and Semantic Dependency Parsing
Enhong Chen | Liu Shi | Dawei Hu
CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning