Xingyuan Li

2025

Toward Automatic Discovery of a Canine Phonetic Alphabet
Theron S. Wang | Xingyuan Li | Hridayesh Lekhak | Tuan Minh Dang | Mengyue Wu | Kenny Q. Zhu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Dogs communicate intelligently but little is known about the phonetic properties of their vocalization communication. For the first time, this paper presents an iterative algorithm inspired by human phonetic discovery, which is based on minimal pairs that determine phonemes by distinguishing different words in human language, and is able to produce a complete alphabet of distinct canine phoneme-like units. In addition, the algorithm produces a number of canine repeated acoustic units, which may correspond to specific environments and activities of a dog, composed exclusively of the canine phoneme-like units in the alphabet. The framework outlined in this paper is expected to function not only on canines but other animal species.

2024

pdf bib abs

Phonetic and Lexical Discovery of Canine Vocalization
Theron S. Wang | Xingyuan Li | Chunhao Zhang | Mengyue Wu | Kenny Q. Zhu
Findings of the Association for Computational Linguistics: EMNLP 2024

This paper attempts to discover communication patterns automatically within dog vocalizations in a data-driven approach, which breaks the barrier previous approaches that rely on human prior knowledge on limited data. We present a self-supervised approach with HuBERT, enabling the accurate classification of phones, and an adaptive grammar induction method that identifies phone sequence patterns that suggest a preliminary vocabulary within dog vocalizations. Our results show that a subset of this vocabulary has substantial causality relations with certain canine activities, suggesting signs of stable semantics associated with these “words”.

2020

pdf bib abs

Our system participates in two shared tasks, CL-SciSumm 2020 and LongSumm 2020. In the CL-SciSumm shared task, based on our previous work, we apply more machine learning methods on position features and content features for facet classification in Task1B. And GCN is introduced in Task2 to perform extractive summarization. In the LongSumm shared task, we integrate both the extractive and abstractive summarization ways. Three methods were tested which are T5 Fine-tuning, DPPs Sampling, and GRU-GCN/GAT.

Co-authors

Hridayesh Lekhak 1

Lei Li 1

Wei Liu 1

Yinan Liu 1

Siya Qi 1

Yang Xie 1

Chunhao Zhang 1

Venues

Fix author