Liming Wang


2022

pdf bib
Self-supervised Semantic-driven Phoneme Discovery for Zero-resource Speech Recognition
Liming Wang | Siyuan Feng | Mark Hasegawa-Johnson | Chang Yoo
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Phonemes are defined by their relationship to words: changing a phoneme changes the word. Learning a phoneme inventory with little supervision has been a longstanding challenge with important applications to under-resourced speech technology. In this paper, we bridge the gap between the linguistic and statistical definition of phonemes and propose a novel neural discrete representation learning model for self-supervised learning of phoneme inventory with raw speech and word labels. Under mild assumptions, we prove that the phoneme inventory learned by our approach converges to the true one with an exponentially low error rate. Moreover, in experiments on TIMIT and Mboshi benchmarks, our approach consistently learns a better phoneme-level representation and achieves a lower error rate in a zero-resource phoneme recognition task than previous state-of-the-art self-supervised representation learning algorithms.

pdf bib
Mitigating the Inconsistency Between Word Saliency and Model Confidence with Pathological Contrastive Training
Pengwei Zhan | Yang Wu | Shaolei Zhou | Yunjian Zhang | Liming Wang
Findings of the Association for Computational Linguistics: ACL 2022

Neural networks are widely used in various NLP tasks for their remarkable performance. However, the complexity makes them difficult to interpret, i.e., they are not guaranteed right for the right reason. Besides the complexity, we reveal that the model pathology - the inconsistency between word saliency and model confidence, further hurts the interpretability. We show that the pathological inconsistency is caused by the representation collapse issue, which means that the representation of the sentences with tokens in different saliency reduced is somehow collapsed, and thus the important words cannot be distinguished from unimportant words in terms of model confidence changing. In this paper, to mitigate the pathology and obtain more interpretable models, we propose Pathological Contrastive Training (PCT) framework, which adopts contrastive learning and saliency-based samples augmentation to calibrate the sentences representation. Combined with qualitative analysis, we also conduct extensive quantitative experiments and measure the interpretability with eight reasonable metrics. Experiments show that our method can mitigate the model pathology and generate more interpretable models while keeping the model performance. Ablation study also shows the effectiveness.

2021

pdf bib
Multimodal Fusion with Co-Attention Networks for Fake News Detection
Yang Wu | Pengwei Zhan | Yunjian Zhang | Liming Wang | Zhen Xu
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Coreference by Appearance: Visually Grounded Event Coreference Resolution
Liming Wang | Shengyu Feng | Xudong Lin | Manling Li | Heng Ji | Shih-Fu Chang
Proceedings of the Fourth Workshop on Computational Models of Reference, Anaphora and Coreference

Event coreference resolution is critical to understand events in the growing number of online news with multiple modalities including text, video, speech, etc. However, the events and entities depicting in different modalities may not be perfectly aligned and can be difficult to annotate, which makes the task especially challenging with little supervision available. To address the above issues, we propose a supervised model based on attention mechanism and an unsupervised model based on statistical machine translation, capable of learning the relative importance of modalities for event coreference resolution. Experiments on a video multimedia event dataset show that our multimodal models outperform text-only systems in event coreference resolution tasks. A careful analysis reveals that the performance gain of the multimodal model especially under unsupervised settings comes from better learning of visually salient events.

2018

pdf bib
XNMT: The eXtensible Neural Machine Translation Toolkit
Graham Neubig | Matthias Sperber | Xinyi Wang | Matthieu Felix | Austin Matthews | Sarguna Padmanabhan | Ye Qi | Devendra Sachan | Philip Arthur | Pierre Godard | John Hewitt | Rachid Riad | Liming Wang
Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)