2022
pdf
bib
abs
DialogUSR: Complex Dialogue Utterance Splitting and Reformulation for Multiple Intent Detection
Haoran Meng
|
Zheng Xin
|
Tianyu Liu
|
Zizhen Wang
|
He Feng
|
Binghuai Lin
|
Xuemin Zhao
|
Yunbo Cao
|
Zhifang Sui
Findings of the Association for Computational Linguistics: EMNLP 2022
While interacting with chatbots, users may elicit multiple intents in a single dialogue utterance. Instead of training a dedicated multi-intent detection model, we propose DialogUSR, a dialogue utterance splitting and reformulation task that first splits multi-intent user query into several single-intent sub-queries and then recovers all the coreferred and omitted information in the sub-queries. DialogUSR can serve as a plug-in and domain-agnostic module that empowers the multi-intent detection for the deployed chatbots with minimal efforts. We collect a high-quality naturally occurring dataset that covers 23 domains with a multi-step crowd-souring procedure. To benchmark the proposed dataset, we propose multiple action-based generative models that involve end-to-end and two-stage training, and conduct in-depth analyses on the pros and cons of the proposed baselines.
2018
pdf
bib
Discriminating between Similar Languages on Imbalanced Conversational Texts
Junqing He
|
Xian Huang
|
Xuemin Zhao
|
Yan Zhang
|
Yonghong Yan
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
pdf
bib
abs
HCCL at SemEval-2018 Task 8: An End-to-End System for Sequence Labeling from Cybersecurity Reports
Mingming Fu
|
Xuemin Zhao
|
Yonghong Yan
Proceedings of the 12th International Workshop on Semantic Evaluation
This paper describes HCCL team systems that participated in SemEval 2018 Task 8: SecureNLP (Semantic Extraction from cybersecurity reports using NLP). To solve the problem, our team applied a neural network architecture that benefits from both word and character level representaions automatically, by using combination of Bi-directional LSTM, CNN and CRF (Ma and Hovy, 2016). Our system is truly end-to-end, requiring no feature engineering or data preprocessing, and we ranked 4th in the subtask 1, 7th in the subtask2 and 3rd in the SubTask2-relaxed.
2017
pdf
bib
abs
HCCL at SemEval-2017 Task 2: Combining Multilingual Word Embeddings and Transliteration Model for Semantic Similarity
Junqing He
|
Long Wu
|
Xuemin Zhao
|
Yonghong Yan
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)
In this paper, we introduce an approach to combining word embeddings and machine translation for multilingual semantic word similarity, the task2 of SemEval-2017. Thanks to the unsupervised transliteration model, our cross-lingual word embeddings encounter decreased sums of OOVs. Our results are produced using only monolingual Wikipedia corpora and a limited amount of sentence-aligned data. Although relatively little resources are utilized, our system ranked 3rd in the monolingual subtask and can be the 6th in the cross-lingual subtask.