Proceedings of the First ACL Workshop on Ethics in Natural Language Processing

Proceedings of the First ACL Workshop on Ethics in Natural Language Processing Dirk Hovy Shannon Spruit Margaret Mitchell Emily M. Bender Michael Strube Hanna Wallach April 2017

Valencia, Spain

Association for Computational Linguistics http://www.aclweb.org/anthology/W17-16 book EthNLP:2017 Gender as a Variable in Natural-Language Processing: Ethical Considerations BrianLarson Proceedings of the First ACL Workshop on Ethics in Natural Language Processing April 2017

Valencia, Spain

Association for Computational Linguistics 1–11 http://www.aclweb.org/anthology/W17-1601 Researchers and practitioners in natural-language processing (NLP) and related fields should attend to ethical principles in study design, ascription of categories/variables to study participants, and reporting of findings or results. This paper discusses theoretical and ethical frameworks for using gender as a variable in NLP studies and proposes four guidelines for researchers and practitioners. The principles outlined here should guide practitioners, researchers, and peer reviewers, and they may be applicable to other social categories, such as race, applied to human beings connected to NLP research. inproceedings larson:2017:EthNLP These are not the Stereotypes You are Looking For: Bias and Fairness in Authorial Gender Attribution CorinaKoolen Andreasvan Cranenburgh Proceedings of the First ACL Workshop on Ethics in Natural Language Processing April 2017

Valencia, Spain

Association for Computational Linguistics 12–22 http://www.aclweb.org/anthology/W17-1602 Stylometric and text categorization results show that author gender can be discerned in texts with relatively high accuracy. However, it is difficult to explain what gives rise to these results and there are many possible confounding factors, such as the domain, genre, and target audience of a text. More fundamentally, such classification efforts risk invoking stereotyping and essentialism. We explore this issue in two datasets of Dutch literary novels, using commonly used descriptive (LIWC, topic modeling) and predictive (machine learning) methods. Our results show the importance of controlling for variables in the corpus and we argue for taking care not to overgeneralize from the results. inproceedings koolen-vancranenburgh:2017:EthNLP A Quantitative Study of Data in the NLP community MargotMieskes Proceedings of the First ACL Workshop on Ethics in Natural Language Processing April 2017

Valencia, Spain

Association for Computational Linguistics 23–29 http://www.aclweb.org/anthology/W17-1603 We present results on a quantitative analysis of publications in the NLP domain on collecting, publishing and availability of research data. We find that a wide range of publications rely on data crawled from the web, but few give details on how potentially sensitive data was treated. Additionally, we find that while links to repositories of data are given, they often do not work even a short time after publication. We put together several suggestions on how to improve this situation based on publications from the NLP domain, but also other research areas. inproceedings mieskes:2017:EthNLP Ethical by Design: Ethics Best Practices for Natural Language Processing Jochen L.Leidner VassilisPlachouras Proceedings of the First ACL Workshop on Ethics in Natural Language Processing April 2017

Valencia, Spain

Association for Computational Linguistics 30–40 http://www.aclweb.org/anthology/W17-1604 Natural language processing (NLP) systems analyze and/or generate human language, typically on users’ behalf. One natural and necessary question that needs to be addressed in this context, both in research projects and in production settings, is the question how ethical the work is, both regarding the process and its outcome. Towards this end, we articulate a set of issues, propose a set of best practices, notably a process featuring an ethics review board, and sketch and how they could be meaningfully applied. Our main argument is that ethical outcomes ought to be achieved by design, i.e. by following a process aligned by ethical values. We also offer some response options for those facing ethics issues. While a number of previous works exist that discuss ethical issues, in particular around big data and machine learning, to the authors’ knowledge this is the first account of NLP and ethics from the perspective of a principled process. inproceedings leidner-plachouras:2017:EthNLP Building Better Open-Source Tools to Support Fairness in Automated Scoring NitinMadnani AnastassiaLoukina Alinavon Davier JillBurstein AoifeCahill Proceedings of the First ACL Workshop on Ethics in Natural Language Processing April 2017

Valencia, Spain

Association for Computational Linguistics 41–52 http://www.aclweb.org/anthology/W17-1605 Automated scoring of written and spoken responses is an NLP application that can significantly impact lives especially when deployed as part of high-stakes tests such as the GREėxtregistered~ and the TOEFLėxtregistered~. Ethical considerations require that automated scoring algorithms treat all test- takers fairly. The educational measurement community has done significant research on fairness in assessments and automated scoring systems must incorporate their recommendations. The best way to do that is by making available automated, non-proprietary tools to NLP researchers that directly incorporate these recommendations and generate the analyses needed to help identify and resolve biases in their scoring systems. In this paper, we attempt to provide such a solution. inproceedings madnani-EtAl:2017:EthNLP Gender and Dialect Bias in YouTube's Automatic Captions RachaelTatman Proceedings of the First ACL Workshop on Ethics in Natural Language Processing April 2017

Valencia, Spain

Association for Computational Linguistics 53–59 http://www.aclweb.org/anthology/W17-1606 This project evaluates the accuracy of YouTube's automatically-generated captions across two genders and five dialect groups. Speakers' dialect and gender was controlled for by using videos uploaded as part of the “accent tag challenge", where speakers explicitly identify their language background. The results show robust differences in accuracy across both gender and dialect, with lower accuracy for 1) women and 2) speakers from Scotland. This finding builds on earlier research finding that speaker's sociolinguistic identity may negatively impact their ability to use automatic speech recognition, and demonstrates the need for sociolinguistically-stratified validation of systems. inproceedings tatman:2017:EthNLP Integrating the Management of Personal Data Protection and Open Science with Research Ethics DaveLewis JossMoorkens KanizFatema Proceedings of the First ACL Workshop on Ethics in Natural Language Processing April 2017

Valencia, Spain

Association for Computational Linguistics 60–65 http://www.aclweb.org/anthology/W17-1607 We examine the impact of the EU General Data Protection Regulation and the push from research funders to provide open access research data on the current practices in Language Technology Research. We analyse the challenges that arise and the opportunities to address many of them through the use of existing open data practices. We discuss the impact of this also on current practice in research ethics. inproceedings lewis-moorkens-fatema:2017:EthNLP Ethical Considerations in NLP Shared Tasks CarlaParra Escartín WesselReijers TeresaLynn JossMoorkens AndyWay Chao-HongLiu Proceedings of the First ACL Workshop on Ethics in Natural Language Processing April 2017

Valencia, Spain

Association for Computational Linguistics 66–73 http://www.aclweb.org/anthology/W17-1608 Shared tasks are increasingly common in our field, and new challenges are suggested at almost every conference and workshop. However, as this has become an established way of pushing research forward, it is important to discuss how we researchers organise and participate in shared tasks, and make that information available to the community to allow further research improvements. In this paper, we present a number of ethical issues along with other areas of concern that are related to the competitive nature of shared tasks. As such issues could potentially impact on research ethics in the Natural Language Processing community, we also propose the development of a framework for the organisation of and participation in shared tasks that can help mitigate against these issues arising. inproceedings parraescartin-EtAl:2017:EthNLP Social Bias in Elicited Natural Language Inferences RachelRudinger ChandlerMay BenjaminVan Durme Proceedings of the First ACL Workshop on Ethics in Natural Language Processing April 2017

Valencia, Spain

Association for Computational Linguistics 74–79 http://www.aclweb.org/anthology/W17-1609 We analyze the Stanford Natural Language Inference (SNLI) corpus in an investigation of bias and stereotyping in NLP data. The SNLI human-elicitation protocol makes it prone to amplifying bias and stereotypical associations, which we demonstrate statistically (using pointwise mutual information) and with qualitative examples. inproceedings rudinger-may-vandurme:2017:EthNLP A Short Review of Ethical Challenges in Clinical Natural Language Processing SimonSuster StephanTulkens WalterDaelemans Proceedings of the First ACL Workshop on Ethics in Natural Language Processing April 2017

Valencia, Spain

Association for Computational Linguistics 80–87 http://www.aclweb.org/anthology/W17-1610 Clinical NLP has an immense potential in contributing to how clinical practice will be revolutionized by the advent of large scale processing of clinical records. However, this potential has remained largely untapped due to slow progress primarily caused by strict data access policies for researchers. In this paper, we discuss the concern for privacy and the measures it entails. We also suggest sources of less sensitive data. Finally, we draw attention to biases that can compromise the validity of empirical research and lead to socially harmful applications. inproceedings suster-tulkens-daelemans:2017:EthNLP Goal-Oriented Design for Ethical Machine Learning and NLP TylerSchnoebelen Proceedings of the First ACL Workshop on Ethics in Natural Language Processing April 2017

Valencia, Spain

Association for Computational Linguistics 88–93 http://www.aclweb.org/anthology/W17-1611 The argument made in this paper is that to act ethically in machine learning and NLP requires focusing on goals. NLP projects are often classificatory systems that deal with human subjects, which means that goals from people affected by the systems should be included. The paper takes as its core example a model that detects criminality, showing the problems of training data, categories, and outcomes. The paper is oriented to the kinds of critiques on power and the reproduction of inequality that are found in social theory, but it also includes concrete suggestions on how to put goal-oriented design into practice. inproceedings schnoebelen:2017:EthNLP Ethical Research Protocols for Social Media Health Research AdrianBenton GlenCoppersmith MarkDredze Proceedings of the First ACL Workshop on Ethics in Natural Language Processing April 2017

Valencia, Spain

Association for Computational Linguistics 94–102 http://www.aclweb.org/anthology/W17-1612 Social media have transformed data-driven research in political science, the social sciences, health, and medicine. Since health research often touches on sensitive topics that relate to ethics of treatment and patient privacy, similar ethical considerations should be acknowledged when using social media data in health research. While much has been said regarding the ethical considerations of social media research, health research leads to an additional set of concerns. We provide practical suggestions in the form of guidelines for researchers working with social media data in health research. These guidelines can inform an IRB proposal for researchers new to social media health research. inproceedings benton-coppersmith-dredze:2017:EthNLP Say the Right Thing Right: Ethics Issues in Natural Language Generation Systems ChareseSmiley FrankSchilder VassilisPlachouras Jochen L.Leidner Proceedings of the First ACL Workshop on Ethics in Natural Language Processing April 2017

Valencia, Spain

Association for Computational Linguistics 103–108 http://www.aclweb.org/anthology/W17-1613 We discuss the ethical implications of Natural Language Generation systems. We use one particular system as a case study to identify and classify issues, and we provide an ethics checklist, in the hope that future system designers may benefit from conducting their own ethics reviews based on our checklist. inproceedings smiley-EtAl:2017:EthNLP