The effect of information controls on developers in China: An analysis of censorship in Chinese open source projects

Jeffrey Knockel, Masashi Crete-Nishihata, Lotus Ruan


Abstract
Censorship of Internet content in China is understood to operate through a system of intermediary liability whereby service providers are liable for the content on their platforms. Previous work studying censorship has found huge variability in the implementation of censorship across different products even within the same industry segment. In this work we explore the extent to which these censorship features are present in the open source projects of individual developers in China by collecting their blacklists and comparing their similarity. We collect files from a popular online code repository, extract lists of strings, and then classify whether each is a Chinese blacklist. Overall, we found over 1,000 Chinese blacklists comprising over 200,000 unique keywords, representing the largest dataset of Chinese blacklisted keywords to date. We found very little keyword overlap between lists, raising questions as to their origins, as the lists seem too large to have been individually curated, yet the lack of overlap suggests that they have no common source.
Anthology ID:
W18-4201
Volume:
Proceedings of the First Workshop on Natural Language Processing for Internet Freedom
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Venue:
NLP4IF
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–11
Language:
URL:
https://aclanthology.org/W18-4201
DOI:
Bibkey:
Cite (ACL):
Jeffrey Knockel, Masashi Crete-Nishihata, and Lotus Ruan. 2018. The effect of information controls on developers in China: An analysis of censorship in Chinese open source projects. In Proceedings of the First Workshop on Natural Language Processing for Internet Freedom, pages 1–11, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
The effect of information controls on developers in China: An analysis of censorship in Chinese open source projects (Knockel et al., NLP4IF 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-4201.pdf
Code
 citizenlab/chat-censorship