Experimenting with Discourse Segmentation of Taiwan Southern Min Spontaneous Speech

Laurent Prévot, Sheng-Fu Wang


Abstract
Discourse segmentation received increased attention in the past years, however the majority of studies have focused on written genres and with high-resource languages. This paper investigates discourse segmentation of a Taiwan Southern Min spontaneous speech corpus. We compare the fine-tuning a Language Model (LLM using two approaches: supervised, thanks to a high-quality annotated dataset, and weakly-supervised, requiring only a small amount of manual labeling. The corpus used here is transcribed with both Chinese characters and romanized transcription. This allows us to compare the impact of the written form on the discourse segmentation task. Additionally, the dataset includes manual prosodic breaks labeling, allowing an exploration of the role prosody can play in contemporary discourse segmentation systems grounded in LLMs. In our study, the supervised approach outperforms weak-supervision ; character-based version demonstrated better scores compared to the romanized version; and prosodic information proved to be an interesting source to increase discourse segmentation performance.
Anthology ID:
2024.codi-1.5
Volume:
Proceedings of the 5th Workshop on Computational Approaches to Discourse (CODI 2024)
Month:
March
Year:
2024
Address:
St. Julians, Malta
Editors:
Michael Strube, Chloe Braud, Christian Hardmeier, Junyi Jessy Li, Sharid Loaiciga, Amir Zeldes, Chuyuan Li
Venues:
CODI | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
50–63
Language:
URL:
https://aclanthology.org/2024.codi-1.5
DOI:
Bibkey:
Cite (ACL):
Laurent Prévot and Sheng-Fu Wang. 2024. Experimenting with Discourse Segmentation of Taiwan Southern Min Spontaneous Speech. In Proceedings of the 5th Workshop on Computational Approaches to Discourse (CODI 2024), pages 50–63, St. Julians, Malta. Association for Computational Linguistics.
Cite (Informal):
Experimenting with Discourse Segmentation of Taiwan Southern Min Spontaneous Speech (Prévot & Wang, CODI-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.codi-1.5.pdf
Supplementary material:
 2024.codi-1.5.SupplementaryMaterial.tex