Steven Coats

2025

pdf bib abs
Regional Distribution of the /el/-/æl/ Merger in Australian English
Steven Coats | Chloé Diskin-Holdaway | Debbie Loakes
Proceedings of the 12th Workshop on NLP for Similar Languages, Varieties and Dialects

Prelateral merger of /e/ and /æ/ is a salient acoustic feature of speech from Melbourne and the state of Victoria in Australia, but little is known about its presence in other parts of the country. In this study, automated methods of data collection, forced alignment, and formant extraction are used to analyze the regional distribution of the vowel merger within all of Australia, in 4.3 million vowel tokens from naturalistic speech in 252 locations. The extent of the merger is quantified using the difference in Bhattacharyya’s distance scores based on phonetic context, and the regional distribution is assessed using spatial autocorrelation. The principal findings are that the merger is most prominent in Victoria and least prominent in Sydney and New South Wales. We also find preliminary indications that it may be present in other parts of the country.

2024

pdf bib abs
CoANZSE Audio: Creation of an Online Corpus for Linguistic and Phonetic Analysis of Australian and New Zealand Englishes
Steven Coats
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

CoANZSE Audio is a searchable online version of the Corpus of Australian and New Zealand Spoken English, a 195-million-word collection of geo-located YouTube transcripts of local government channels. In addition to the part-of-speech-tagged and lemmatized transcript data, CoANZSE Audio provides access to almost all of the underlying audio, as well as to forced alignments of the audio with transcript content, in Praat’s TextGrid format. This paper describes the methods used to create the corpus from open-source tools and the architecture of the CoANZSE Audio website. Two possible linguistic analyses based on CoANZSE Audio data are described: use of double modals, a rare syntactic feature, and raising of the mid front vowel /ɛ/ in New Zealand English. CoANZSE Audio can be considered to be among the first large, free, fully searchable online corpora containing data suitable for acoustic phonetic analyses in addition to lexical, grammatical, and discourse properties of Australian and New Zealand Englishes.

2023

pdf bib
Methods for Phonetic Scraping of Youtube Videos
Adrien Meli | Steven Coats | Nicolas Ballier
Proceedings of the 6th International Conference on Natural Language and Speech Processing (ICNLSP 2023)

2022

pdf bib
The Corpus of Australian and New Zealand Spoken English: A new resource of naturalistic speech transcripts
Steven Coats
Proceedings of the 20th Annual Workshop of the Australasian Language Technology Association

Co-authors

Venues

alta1
coling1
icnlsp1
lrec1
vardial1
show all...

ws1

Fix data