Xiaodong Shi


2024

pdf bib
Signer Diversity-driven Data Augmentation for Signer-Independent Sign Language Translation
Honghao Fu | Liang Zhang | Biao Fu | Rui Zhao | Jinsong Su | Xiaodong Shi | Yidong Chen
Findings of the Association for Computational Linguistics: NAACL 2024

The primary objective of sign language translation (SLT) is to transform sign language videos into natural sentences.A crucial challenge in this field is developing signer-independent SLT systems which requires models to generalize effectively to signers not encountered during training.This challenge is exacerbated by the limited diversity of signers in existing SLT datasets, which often results in suboptimal generalization capabilities of current models.Achieving robustness to unseen signers is essential for signer-independent SLT.However, most existing method relies on signer identity labels, which is often impractical and costly in real-world applications.To address this issue, we propose the Signer Diversity-driven Data Augmentation (SDDA) method that can achieve good generalization without relying on signer identity labels. SDDA comprises two data augmentation schemes. The first is data augmentation based on adversarial training, which aims to utilize the gradients of the model to generate adversarial examples. The second is data augmentation based on diffusion model, which focuses on using the advanced diffusion-based text guided image editing method to modify the appearances of the signer in images. The combination of the two strategies significantly enriches the diversity of signers in the training process.Moreover, we introduce a consistency loss and a discrimination loss to enhance the learning of signer-independent features.Our experimental results demonstrate our model significantly enhances the performance of SLT in the signer-independent setting, achieving state-of-the-art results without relying on signer identity labels.

pdf bib
wav2vec-S: Adapting Pre-trained Speech Models for Streaming
Biao Fu | Kai Fan | Minpeng Liao | Yidong Chen | Xiaodong Shi | Zhongqiang Huang
Findings of the Association for Computational Linguistics: ACL 2024

Pre-trained speech models, such as wav2vec 2.0, have significantly advanced speech-related tasks, including speech recognition and translation. However, their applicability in streaming scenarios is limited because these models are trained on complete utterances, leading to a mismatch with incremental streaming inputs. This paper identifies three critical design aspects within the architecture of wav2vec 2.0 and proposes a novel model, wav2vec-S, which incorporates simple modifications to ensure consistent speech representations during both training and inference phases for streaming speech inputs. Furthermore, we demonstrate that wav2vec-S models can be efficiently adapted from pre-trained wav2vec 2.0 models through continued pre-training and effectively finetuned to meet various latency requirements in downstream applications. Experiments on speech recognition and translation tasks show that wav2vec-S outperforms strong baseline models and achieves a superior balance between quality and latency.

pdf bib
MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization
Zhiyu Yang | Zihan Zhou | Shuo Wang | Xin Cong | Xu Han | Yukun Yan | Zhenghao Liu | Zhixing Tan | Pengyuan Liu | Dong Yu | Zhiyuan Liu | Xiaodong Shi | Maosong Sun
Findings of the Association for Computational Linguistics: ACL 2024

Scientific data visualization plays a crucial role in research by enabling the direct display of complex information and assisting researchers in identifying implicit patterns. Despite its importance, the use of Large Language Models (LLMs) for scientific data visualization remains rather unexplored. In this study, we introduce MatPlotAgent, an efficient model-agnostic LLM agent framework designed to automate scientific data visualization tasks. Leveraging the capabilities of both code LLMs and multi-modal LLMs, MatPlotAgent consists of three core modules: query understanding, code generation with iterative debugging, and a visual feedback mechanism for error correction. To address the lack of benchmarks in this field, we present MatPlotBench, a high-quality benchmark consisting of 100 human-verified test cases. Additionally, we introduce a scoring approach that utilizes GPT-4V for automatic evaluation. Experimental results demonstrate that MatPlotAgent can improve the performance of various LLMs, including both commercial and open-source models. Furthermore, the proposed evaluation method shows a strong correlation with human-annotated scores.

pdf bib
Adaptive Simultaneous Sign Language Translation with Confident Translation Length Estimation
Tong Sun | Biao Fu | Cong Hu | Liang Zhang | Ruiquan Zhang | Xiaodong Shi | Jinsong Su | Yidong Chen
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Traditional non-simultaneous Sign Language Translation (SLT) methods, while effective for pre-recorded videos, face challenges in real-time scenarios due to inherent inference delays. The emerging field of simultaneous SLT aims to address this issue by progressively translating incrementally received sign video. However, the sole existing work in simultaneous SLT adopts a fixed gloss-based policy, which suffer from limitations in boundary prediction and contextual comprehension. In this paper, we delve deeper into this area and propose an adaptive policy for simultaneous SLT. Our approach introduces the concept of “confident translation length”, denoting maximum accurate translation achievable from current input. An estimator measures this length for streaming sign video, enabling the model to make informed decisions on whether to wait for more input or proceed with translation. To train the estimator, we construct a training data of confident translation length based on the longest common prefix between translations of partial and complete inputs. Furthermore, we incorporate adaptive training, utilizing pseudo prefix pairs, to refine the offline translation model for optimal performance in simultaneous scenarios. Experimental results on PHOENIX2014T and CSL-Daily demonstrate the superiority of our adaptive policy over existing methods, particularly excelling in situations requiring extremely low latency.

2023

pdf bib
Consistent Prototype Learning for Few-Shot Continual Relation Extraction
Xiudi Chen | Hui Wu | Xiaodong Shi
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Few-shot continual relation extraction aims to continually train a model on incrementally few-shot data to learn new relations while avoiding forgetting old ones. However, current memory-based methods are prone to overfitting memory samples, resulting in insufficient activation of old relations and limited ability to handle the confusion of similar classes. In this paper, we design a new N-way-K-shot Continual Relation Extraction (NK-CRE) task and propose a novel few-shot continual relation extraction method with Consistent Prototype Learning (ConPL) to address the aforementioned issues. Our proposed ConPL is mainly composed of three modules: 1) a prototype-based classification module that provides primary relation predictions under few-shot continual learning; 2) a memory-enhanced module designed to select vital samples and refined prototypical representations as a novel multi-information episodic memory; 3) a consistent learning module to reduce catastrophic forgetting by enforcing distribution consistency. To effectively mitigate catastrophic forgetting, ConPL ensures that the samples and prototypes in the episodic memory remain consistent in terms of classification and distribution. Additionally, ConPL uses prompt learning to extract better representations and adopts a focal loss to alleviate the confusion of similar classes. Experimental results on two commonly-used datasets show that our model consistently outperforms other competitive baselines.

pdf bib
Learning to Compose Representations of Different Encoder Layers towards Improving Compositional Generalization
Lei Lin | Shuangtao Li | Yafang Zheng | Biao Fu | Shan Liu | Yidong Chen | Xiaodong Shi
Findings of the Association for Computational Linguistics: EMNLP 2023

Recent studies have shown that sequence-to-sequence (seq2seq) models struggle with compositional generalization (CG), i.e., the ability to systematically generalize to unseen compositions of seen components. There is mounting evidence that one of the reasons hindering CG is the representation of the encoder uppermost layer is entangled, i.e., the syntactic and semantic representations of sequences are entangled. However, we consider that the previously identified representation entanglement problem is not comprehensive enough. Additionally, we hypothesize that the source keys and values representations passing into different decoder layers are also entangled. Starting from this intuition, we propose CompoSition (Compose Syntactic and Semantic Representations), an extension to seq2seq models which learns to compose representations of different encoder layers dynamically for different tasks, since recent studies reveal that the bottom layers of the Transformer encoder contain more syntactic information and the top ones contain more semantic information. Specifically, we introduce a composed layer between the encoder and decoder to compose different encoder layers’ representations to generate specific keys and values passing into different decoder layers. CompoSition achieves competitive results on two comprehensive and realistic benchmarks, which empirically demonstrates the effectiveness of our proposal. Codes are available at https://github.com/thinkaboutzero/COMPOSITION.

pdf bib
Improving Chinese Pop Song and Hokkien Gezi Opera Singing Voice Synthesis by Enhancing Local Modeling
Peng Bai | Yue Zhou | Meizhen Zheng | Wujin Sun | Xiaodong Shi
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Singing Voice Synthesis (SVS) strives to synthesize pleasing vocals based on music scores and lyrics. The current acoustic models based on Transformer usually process the entire sequence globally and use a simple L1 loss. However, this approach overlooks the significance of local modeling within the sequence and the local optimization of the hard-to-synthesize parts in the predicted mel-spectrogram. Consequently, the synthesized audio exhibits local incongruities (e.g., local pronunciation jitter or local noise). To address this problem, we propose two methods to enhance local modeling in the acoustic model. First, we devise a nearest neighbor local attention, where each phoneme token focuses only on the adjacent phoneme tokens located before and after it. Second, we propose a phoneme-level local adaptive weights loss function that enables the model to focus more on the hard-to-synthesize parts of the mel-spectrogram. We have verified the universality of our methods on public Chinese pop song and Hokkien Gezi Opera datasets. Extensive experiments have demonstrated the effectiveness of our methods, resulting in significant improvements in both objective and subjective evaluations when compared to the strong baselines. Our code and demonstration samples are available at https://github.com/baipeng1/SVSELM.

pdf bib
Adapting Offline Speech Translation Models for Streaming with Future-Aware Distillation and Inference
Biao Fu | Minpeng Liao | Kai Fan | Zhongqiang Huang | Boxing Chen | Yidong Chen | Xiaodong Shi
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

A popular approach to streaming speech translation is to employ a single offline model with a wait-k policy to support different latency requirements, which is simpler than training multiple online models with different latency constraints. However, there is a mismatch problem in using a model trained with complete utterances for streaming inference with partial input. We demonstrate that speech representations extracted at the end of a streaming input are significantly different from those extracted from a complete utterance. To address this issue, we propose a new approach called Future-Aware Streaming Translation (FAST) that adapts an offline ST model for streaming input. FAST includes a Future-Aware Inference (FAI) strategy that incorporates future context through a trainable masked embedding, and a Future-Aware Distillation (FAD) framework that transfers future context from an approximation of full speech to streaming input. Our experiments on the MuST-C EnDe, EnEs, and EnFr benchmarks show that FAST achieves better trade-offs between translation quality and latency than strong baselines. Extensive analyses suggest that our methods effectively alleviate the aforementioned mismatch problem between offline training and online inference.

2022

pdf bib
Adversarial Soft Prompt Tuning for Cross-Domain Sentiment Analysis
Hui Wu | Xiaodong Shi
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Cross-domain sentiment analysis has achieved promising results with the help of pre-trained language models. As GPT-3 appears, prompt tuning has been widely explored to enable better semantic modeling in many natural language processing tasks. However, directly using a fixed predefined template for cross-domain research cannot model different distributions of the [MASK] token in different domains, thus making underuse of the prompt tuning technique. In this paper, we propose a novel Adversarial Soft Prompt Tuning method (AdSPT) to better model cross-domain sentiment analysis. On the one hand, AdSPT adopts separate soft prompts instead of hard templates to learn different vectors for different domains, thus alleviating the domain discrepancy of the [MASK] token in the masked language modeling task. On the other hand, AdSPT uses a novel domain adversarial training strategy to learn domain-invariant representations between each source domain and the target domain. Experiments on a publicly available sentiment analysis dataset show that our model achieves the new state-of-the-art results for both single-source domain adaptation and multi-source domain adaptation.

pdf bib
Towards Better Document-level Relation Extraction via Iterative Inference
Liang Zhang | Jinsong Su | Yidong Chen | Zhongjian Miao | Min Zijun | Qingguo Hu | Xiaodong Shi
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Document-level relation extraction (RE) aims to extract the relations between entities from the input document that usually containing many difficultly-predicted entity pairs whose relations can only be predicted through relational inference. Existing methods usually directly predict the relations of all entity pairs of input document in a one-pass manner, ignoring the fact that predictions of some entity pairs heavily depend on the predicted results of other pairs. To deal with this issue, in this paper, we propose a novel document-level RE model with iterative inference. Our model is mainly composed of two modules: 1) a base module expected to provide preliminary relation predictions on entity pairs; 2) an inference module introduced to refine these preliminary predictions by iteratively dealing with difficultly-predicted entity pairs depending on other pairs in an easy-to-hard manner. Unlike previous methods which only consider feature information of entity pairs, our inference module is equipped with two Extended Cross Attention units, allowing it to exploit both feature information and previous predictions of entity pairs during relational inference. Furthermore, we adopt a two-stage strategy to train our model. At the first stage, we only train our base module. During the second stage, we train the whole model, where contrastive learning is introduced to enhance the training of inference module. Experimental results on three commonly-used datasets show that our model consistently outperforms other competitive baselines.

2021

pdf bib
一种基于IDLSTM+CRF的中文主地域抽取方法(A Chinese Main Location Extraction Method based on IDLSTM+CRF)
Yiqi Tong (童逸琦) | Peigen Ye (叶培根) | Biao Fu (付彪) | Yidong Chen (陈毅东) | Xiaodong Shi (史晓东)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

新闻文本通常会涉及多个地域,主地域则描述了文本舆情内容的地域属性,是进行舆情分析的关键属性。目前深度学习领域针对主地域自动抽取的研究还比较少。基于此,本文构建了一个基于IDLSTM+CRF的主地域抽取系统。该系统通过地名识别、主地域抽取、主地域补全三大模块实现对主地域标签的自动抽取和补全。在公开数据集上的实验结果表明,我们的方法在地名识别任务上要优于BiLSTM+CRF等模型。而对于主地域抽取任务,目前还没有标准的中文主地域评测集合。针对该问题,我们标注并开源了1226条验证集和1500条测试集。最终,我们的主地域抽取系统在两个集合上分别取得了91.7%和84.8%的抽取准确率,并成功运用于线上生产环境。

pdf bib
A Multi-Task Approach for Improving Biomedical Named Entity Recognition by Incorporating Multi-Granularity information
Yiqi Tong | Yidong Chen | Xiaodong Shi
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Synchronous Dual Network with Cross-Type Attention for Joint Entity and Relation Extraction
Hui Wu | Xiaodong Shi
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Joint entity and relation extraction is challenging due to the complex interaction of interaction between named entity recognition and relation extraction. Although most existing works tend to jointly train these two tasks through a shared network, they fail to fully utilize the interdependence between entity types and relation types. In this paper, we design a novel synchronous dual network (SDN) with cross-type attention via separately and interactively considering the entity types and relation types. On the one hand, SDN adopts two isomorphic bi-directional type-attention LSTM to encode the entity type enhanced representations and the relation type enhanced representations, respectively. On the other hand, SDN explicitly models the interdependence between entity types and relation types via cross-type attention mechanism. In addition, we also propose a new multi-task learning strategy via modeling the interaction of two types of information. Experiments on NYT and WebNLG datasets verify the effectiveness of the proposed model, achieving state-of-the-art performance.

pdf bib
XMU’s Simultaneous Translation System at NAACL 2021
Shuangtao Li | Jinming Hu | Boli Wang | Xiaodong Shi | Yidong Chen
Proceedings of the Second Workshop on Automatic Simultaneous Translation

This paper describes our two systems submitted to the simultaneous translation evaluation at the 2nd automatic simultaneous translation workshop.

2020

pdf bib
A Document-Level Neural Machine Translation Model with Dynamic Caching Guided by Theme-Rheme Information
Yiqi Tong | Jiangbin Zheng | Hongkang Zhu | Yidong Chen | Xiaodong Shi
Proceedings of the 28th International Conference on Computational Linguistics

Research on document-level Neural Machine Translation (NMT) models has attracted increasing attention in recent years. Although the proposed works have proved that the inter-sentence information is helpful for improving the performance of the NMT models, what information should be regarded as context remains ambiguous. To solve this problem, we proposed a novel cache-based document-level NMT model which conducts dynamic caching guided by theme-rheme information. The experiments on NIST evaluation sets demonstrate that our proposed model achieves substantial improvements over the state-of-the-art baseline NMT models. As far as we know, we are the first to introduce theme-rheme theory into the field of machine translation.

2018

pdf bib
XMU Neural Machine Translation Systems for WAT2018 Myanmar-English Translation Task
Boli Wang | Jinming Hu | Yidong Chen | Xiaodong Shi
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 5th Workshop on Asian Translation: 5th Workshop on Asian Translation

2017

pdf bib
XMU Neural Machine Translation Online Service
Boli Wang | Zhixing Tan | Jinming Hu | Yidong Chen | Xiaodong Shi
Proceedings of the IJCNLP 2017, System Demonstrations

We demonstrate a neural machine translation web service. Our NMT service provides web-based translation interfaces for a variety of language pairs. We describe the architecture of NMT runtime pipeline and the training details of NMT models. We also show several applications of our online translation interfaces.

pdf bib
XMU Neural Machine Translation Systems for WMT 17
Zhixing Tan | Boli Wang | Jinming Hu | Yidong Chen | Xiaodong Shi
Proceedings of the Second Conference on Machine Translation

pdf bib
XMU Neural Machine Translation Systems for WAT 2017
Boli Wang | Zhixing Tan | Jinming Hu | Yidong Chen | Xiaodong Shi
Proceedings of the 4th Workshop on Asian Translation (WAT2017)

This paper describes the Neural Machine Translation systems of Xiamen University for the shared translation tasks of WAT 2017. Our systems are based on the Encoder-Decoder framework with attention. We participated in three subtasks. We experimented subword segmentation, synthetic training data and model ensembling. Experiments show that all these methods can give substantial improvements.

pdf bib
Improving Implicit Discourse Relation Recognition with Discourse-specific Word Embeddings
Changxing Wu | Xiaodong Shi | Yidong Chen | Jinsong Su | Boli Wang
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We introduce a simple and effective method to learn discourse-specific word embeddings (DSWE) for implicit discourse relation recognition. Specifically, DSWE is learned by performing connective classification on massive explicit discourse data, and capable of capturing discourse relationships between words. On the PDTB data set, using DSWE as features achieves significant improvements over baselines.

2016

pdf bib
Bilingually-constrained Synthetic Data for Implicit Discourse Relation Recognition
Changxing Wu | Xiaodong Shi | Yidong Chen | Yanzhou Huang | Jinsong Su
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

2014

pdf bib
On-going Cooperative Research towards Developing Economy-Oriented Chinese-French SMT Systems with a New SMT Framework
Yidong Chen | Lingxiao Wang | Christian Boitet | Xiaodong Shi
Proceedings of TALN 2014 (Volume 2: Short Papers)

2012

pdf bib
Translation Model Adaptation for Statistical Machine Translation with Monolingual Topic Information
Jinsong Su | Hua Wu | Haifeng Wang | Yidong Chen | Xiaodong Shi | Huailin Dong | Qun Liu
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2011

pdf bib
Improving the Hierarchical Phrase-Based Translation Model
Xiaodong Shi | Xiang Zhu | Yidong Chen
Proceedings of Machine Translation Summit XIII: Papers

2010

pdf bib
Chinese Personal Name Disambiguation: Technical Report of Natural Language Processing Lab of Xiamen University
Xiang Zhu | Xiaodong Shi | Ningfeng Liu | YingMei Guo | Yidong Chen
CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
Chinese Word Sense Induction based on Hierarchical Clustering Algorithm
Ke Cai | Xiaodong Shi | Yidong Chen | Zhehuang Huang | Yan Gao
CIPS-SIGHAN Joint Conference on Chinese Language Processing

2007

pdf bib
Nbest Dependency Parsing with linguistically rich models
Xiaodong Shi
Proceedings of the Tenth International Conference on Parsing Technologies

pdf bib
The XMU SMT system for IWSLT 2007
Yidong Chen | Xiaodong Shi | Changle Zhou
Proceedings of the Fourth International Workshop on Spoken Language Translation

In this paper, an overview of the XMU statistical machine translation (SMT) system for the 2007 IWSLT Speech Translation Evaluation is given. Our system is a phrase-based system with a reordering model based on chunking and reordering of source language. In this year’s evaluation, we participated in the open data track for Clean Transcripts for the Chinese-English translation direction. The system ranked the 12th among the 15 participating systems.

2006

pdf bib
The XMU phrase-based statistical machine translation system for IWSLT 2006
Yidong Chen | Xiaodong Shi | Changle Zhou
Proceedings of the Third International Workshop on Spoken Language Translation: Evaluation Campaign