Xin Huang


pdf bib
Ensemble Method via Ranking Model for Conversational Modeling with Subjective Knowledge
Xin Huang | Kye Min Tan | Richeng Duan | Bowei Zou
Proceedings of The Eleventh Dialog System Technology Challenge

This paper describes our submission to the fifth track of the 11th Dialog System Technology Challenge (DSTC-11), which focuses on “Task-oriented Conversational Modeling with Subjective Knowledge”. We focus on response generation and leverage a ranking strategy to ensemble individual models of BART, Long-T5, and a fine-tuned large language model based on LLaMA. The strategy is supplemented by other techniques like low rank adaptation to maintain efficient utilization of these large models while still achieving optimal performance. The experiments show that the ensemble method outperforms individual models and the baseline method. Our model was ranked 1st place in ROUGE_1, 2nd place in ROUGE_L score and 4th place in human evaluation among a total of 14 participating teams.


pdf bib
Pyramid-BERT: Reducing Complexity via Successive Core-set based Token Selection
Xin Huang | Ashish Khetan | Rene Bidart | Zohar Karnin
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Transformer-based language models such as BERT (CITATION) have achieved the state-of-the-art performance on various NLP tasks, but are computationally prohibitive. A recent line of works use various heuristics to successively shorten sequence length while transforming tokens through encoders, in tasks such as classification and ranking that require a single token embedding for prediction. We present a novel solution to this problem, called Pyramid-BERT where we replace previously used heuristics with a core-set based token selection method justified by theoretical results. The core-set based token selection technique allows us to avoid expensive pre-training, gives a space-efficient fine tuning, and thus makes it suitable to handle longer sequence lengths. We provide extensive experiments establishing advantages of pyramid BERT over several baselines and existing works on the GLUE benchmarks and Long Range Arena (CITATION) datasets.


pdf bib
Unseen Entity Handling in Complex Question Answering over Knowledge Base via Language Generation
Xin Huang | Jung-Jae Kim | Bowei Zou
Findings of the Association for Computational Linguistics: EMNLP 2021

Complex question answering over knowledge base remains as a challenging task because it involves reasoning over multiple pieces of information, including intermediate entities/relations and other constraints. Previous methods simplify the SPARQL query of a question into such forms as a list or a graph, missing such constraints as “filter” and “order_by”, and present models specialized for generating those simplified forms from a given question. We instead introduce a novel approach that directly generates an executable SPARQL query without simplification, addressing the issue of generating unseen entities. We adapt large scale pre-trained encoder-decoder models and show that our method significantly outperforms the previous methods and also that our method has higher interpretability and computational efficiency than the previous methods.

pdf bib
Entity-level Cross-modal Learning Improves Multi-modal Machine Translation
Xin Huang | Jiajun Zhang | Chengqing Zong
Findings of the Association for Computational Linguistics: EMNLP 2021

Multi-modal machine translation (MMT) aims at improving translation performance by incorporating visual information. Most of the studies leverage the visual information through integrating the global image features as auxiliary input or decoding by attending to relevant local regions of the image. However, this kind of usage of visual information makes it difficult to figure out how the visual modality helps and why it works. Inspired by the findings of (CITATION) that entities are most informative in the image, we propose an explicit entity-level cross-modal learning approach that aims to augment the entity representation. Specifically, the approach is framed as a reconstruction task that reconstructs the original textural input from multi-modal input in which entities are replaced with visual features. Then, a multi-task framework is employed to combine the translation task and the reconstruction task to make full use of cross-modal entity representation learning. The extensive experiments demonstrate that our approach can achieve comparable or even better performance than state-of-the-art models. Furthermore, our in-depth analysis shows how visual information improves translation.


pdf bib
Hyperbolic Capsule Networks for Multi-Label Classification
Boli Chen | Xin Huang | Lin Xiao | Liping Jing
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Although deep neural networks are effective at extracting high-level features, classification methods usually encode an input into a vector representation via simple feature aggregation operations (e.g. pooling). Such operations limit the performance. For instance, a multi-label document may contain several concepts. In this case, one vector can not sufficiently capture its salient and discriminative content. Thus, we propose Hyperbolic Capsule Networks (HyperCaps) for Multi-Label Classification (MLC), which have two merits. First, hyperbolic capsules are designed to capture fine-grained document information for each label, which has the ability to characterize complicated structures among labels and documents. Second, Hyperbolic Dynamic Routing (HDR) is introduced to aggregate hyperbolic capsules in a label-aware manner, so that the label-level discriminative information can be preserved along the depth of neural networks. To efficiently handle large-scale MLC datasets, we additionally present a new routing method to adaptively adjust the capsule number during routing. Extensive experiments are conducted on four benchmark datasets. Compared with the state-of-the-art methods, HyperCaps significantly improves the performance of MLC especially on tail labels.


pdf bib
Label-Specific Document Representation for Multi-Label Text Classification
Lin Xiao | Xin Huang | Boli Chen | Liping Jing
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Multi-label text classification (MLTC) aims to tag most relevant labels for the given document. In this paper, we propose a Label-Specific Attention Network (LSAN) to learn a label-specific document representation. LSAN takes advantage of label semantic information to determine the semantic connection between labels and document for constructing label-specific document representation. Meanwhile, the self-attention mechanism is adopted to identify the label-specific document representation from document content information. In order to seamlessly integrate the above two parts, an adaptive fusion strategy is proposed, which can effectively output the comprehensive label-specific document representation to build multi-label text classifier. Extensive experimental results demonstrate that LSAN consistently outperforms the state-of-the-art methods on four different datasets, especially on the prediction of low-frequency labels. The code and hyper-parameter settings are released to facilitate other researchers.