Bao Chen
2023
Automatic Evaluate Dialogue Appropriateness by Using Dialogue Act
Bao Chen
|
Yuanjie Wang
|
Zeming Liu
|
Yuhang Guo
Findings of the Association for Computational Linguistics: EMNLP 2023
Evaluation of dialogue systems requires assessing various aspects, among which appropriateness holds significance as a core element of communicative language competence. However, current evaluations heavily rely on human judgments, which are time-consuming, labor-intensive, prone to biases, and lacking objectivity. In this paper, we introduce Dialogue Act Appropriateness (DAA), a novel method that utilizes the underlying patterns of dialogue act transitions to evaluate the appropriateness of chatbot responses. We learn transition patterns from human-human dialogue corpora, evaluating chatbot appropriateness by measuring the similarity of their transition patterns to those observed in human-human dialogues. To validate DAA, we annotate a test dataset by manually evaluating the appropriateness of dialogues from multiple chatbot systems. The experimental results demonstrate a strong correlation between our evaluation metric and human ratings, establishing the reliability of DAA as a measure of dialogue appropriateness.
2022
BIT-Xiaomi’s System for AutoSimTrans 2022
Mengge Liu
|
Xiang Li
|
Bao Chen
|
Yanzhi Tian
|
Tianwei Lan
|
Silin Li
|
Yuhang Guo
|
Jian Luan
|
Bin Wang
Proceedings of the Third Workshop on Automatic Simultaneous Translation
This system paper describes the BIT-Xiaomi simultaneous translation system for Autosimtrans 2022 simultaneous translation challenge. We participated in three tracks: the Zh-En text-to-text track, the Zh-En audio-to-text track and the En-Es test-to-text track. In our system, wait-k is employed to train prefix-to-prefix translation models. We integrate streaming chunking to detect boundaries as the source streaming read in. We further improve our system with data selection, data-augmentation and R-drop training methods. Results show that our wait-k implementation outperforms organizer’s baseline by 8 BLEU score at most, and our proposed streaming chunking method further improves about 2 BLEU in low latency regime.
Search
Co-authors
- Yuhang Guo 2
- Yuanjie Wang 1
- Zeming Liu 1
- Mengge Liu 1
- Xiang Li 1
- show all...