Crowdsourced Participants’ Accuracy at Identifying the Social Class of Speakers from South East England
Proceedings of the 2nd Workshop on Novel Incentives in Data Collection from People: models, implementations, challenges and results within LREC 2022
Five participants, each located in distinct locations (USA, Canada, South Africa, Scotland and (South East) England), identified the self-determined social class of a corpus of 227 speakers (born 1986–2001; from South East England) based on 10-second passage readings. This pilot study demonstrates the potential for using crowdsourcing to collect sociolinguistic data, specifically using LanguageARC, especially when geographic spread of participants is desirable but not easily possible using traditional fieldwork methods. Results show that, firstly, accuracy at identifying social class is relatively low when compared to other factors, including when the same speech stimuli were used (e.g., ethnicity: Cole 2020). Secondly, participants identified speakers’ social class significantly better than chance for a three-class distinction (working, middle, upper) but not for a six-class distinction. Thirdly, despite some differences in performance, the participant located in South East England did not perform significantly better than other participants, suggesting that the participant’s presumed greater familiarity with sociolinguistic variation in the region may not have been advantageous. Finally, there is a distinction to be made between participants’ ability to pinpoint a speaker’s exact social class membership and their ability to identify the speaker’s relative class position. This paper discusses the role of social identification tasks in illuminating how speech is categorised and interpreted.
Identifications of Speaker Ethnicity in South-East England: Multicultural London English as a Divisible Perceptual Variety
Proceedings of the LREC 2020 Workshop on "Citizen Linguistics in Language Resource Development"
This study uses crowdsourcing through LanguageARC to collect data on levels of accuracy in the identification of speakers’ ethnicities. Ten participants (5 US; 5 South-East England) classified lexically identical speech stimuli from a corpus of 227 speakers aged 18-33yrs from South-East England into the main “ethnic” groups in Britain: White British, Black British and Asian British. Firstly, the data reveals that there is no significant geographic proximity effect on performance between US and British participants. Secondly, results contribute to recent work suggesting that despite the varying heritages of young, ethnic minority speakers in London, they speak an innovative and emerging variety: Multicultural London English (MLE) (e.g. Cheshire et al., 2011). Countering this, participants found perceptual linguistic differences between speakers of all 3 ethnicities (80.7% accuracy). The highest rate of accuracy (96%) was when identifying the ethnicity of Black British speakers from London whose speech seems to form a distinct, perceptual category. Participants also perform substantially better than chance at identifying Black British and Asian British speakers who are not from London (80% and 60% respectively). This suggests that MLE is not a single, homogeneous variety but instead, there are perceptual linguistic differences by ethnicity which transcend the borders of London.