Robert Ridley
2023
Addressing Linguistic Bias through a Contrastive Analysis of Academic Writing in the NLP Domain
Robert Ridley
|
Zhen Wu
|
Jianbing Zhang
|
Shujian Huang
|
Xinyu Dai
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
It has been well documented that a reviewer’s opinion of the nativeness of expression in an academic paper affects the likelihood of it being accepted for publication. Previous works have also shone a light on the stress and anxiety authors who are non-native English speakers experience when attempting to publish in international venues. We explore how this might be a concern in the field of Natural Language Processing (NLP) through conducting a comprehensive statistical analysis of NLP paper abstracts, identifying how authors of different linguistic backgrounds differ in the lexical, morphological, syntactic and cohesive aspects of their writing. Through our analysis, we identify that there are a number of characteristics that are highly variable across the different corpora examined in this paper. This indicates potential for the presence of linguistic bias. Therefore, we outline a set of recommendations to publishers of academic journals and conferences regarding their guidelines and resources for prospective authors in order to help enhance inclusivity and fairness.
2020
Integrating BERT and Score-based Feature Gates for Chinese Grammatical Error Diagnosis
Yongchang Cao
|
Liang He
|
Robert Ridley
|
Xinyu Dai
Proceedings of the 6th Workshop on Natural Language Processing Techniques for Educational Applications
This paper describes our proposed model for the Chinese Grammatical Error Diagnosis (CGED) task in NLPTEA2020. The goal of CGED is to use natural language processing techniques to automatically diagnose Chinese grammatical errors in sentences. To this end, we design and implement a CGED model named BERT with Score-feature Gates Error Diagnoser (BSGED), which is based on the BERT model, Bidirectional Long Short-Term Memory (BiLSTM) and conditional random field (CRF). In order to address the problem of losing partial-order relationships when embedding continuous feature items as with previous works, we propose a gating mechanism for integrating continuous feature items, which effectively retains the partial-order relationships between feature items. We perform LSTM processing on the encoding result of the BERT model, and further extract the sequence features. In the final test-set evaluation, we obtained the highest F1 score at the detection level and are among the top 3 F1 scores at the identification level.
Search
Fix data
Co-authors
- Xinyu Dai 2
- Yongchang Cao 1
- Liang He 1
- Shujian Huang (书剑 黄) 1
- Zhen Wu 1
- show all...