Benjamin Adams


2023

pdf bib
cantnlp@LT-EDI-2023: Homophobia/Transphobia Detection in Social Media Comments using Spatio-Temporally Retrained Language Models
Sidney Wong | Matthew Durward | Benjamin Adams | Jonathan Dunn
Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion

This paper describes our multiclass classification system developed as part of the LT-EDI@RANLP-2023 shared task. We used a BERT-based language model to detect homophobic and transphobic content in social media comments across five language conditions: English, Spanish, Hindi, Malayalam, and Tamil. We retrained a transformer-based cross-language pretrained language model, XLM-RoBERTa, with spatially and temporally relevant social media language data. We found the inclusion of this spatio-temporal data improved the classification performance for all language and task conditions when compared with the baseline. We also retrained a subset of models with simulated script-mixed social media language data with varied performance. The results from the current study suggests that transformer-based language classification systems are sensitive to register-specific and language-specific retraining.

2020

pdf bib
Measuring Linguistic Diversity During COVID-19
Jonathan Dunn | Tom Coupe | Benjamin Adams
Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science

Computational measures of linguistic diversity help us understand the linguistic landscape using digital language data. The contribution of this paper is to calibrate measures of linguistic diversity using restrictions on international travel resulting from the COVID-19 pandemic. Previous work has mapped the distribution of languages using geo-referenced social media and web data. The goal, however, has been to describe these corpora themselves rather than to make inferences about underlying populations. This paper shows that a difference-in-differences method based on the Herfindahl-Hirschman Index can identify the bias in digital corpora that is introduced by non-local populations. These methods tell us where significant changes have taken place and whether this leads to increased or decreased diversity. This is an important step in aligning digital corpora like social media with the real-world populations that have produced them.