Big Community Data before World Wide Web Era

Tomoya Iwakura, Tetsuro Takahashi, Akihiro Ohtani, Kunio Matsui


Abstract
This paper introduces the NIFTY-Serve corpus, a large data archive collected from Japanese discussion forums that operated via a Bulletin Board System (BBS) between 1987 and 2006. This corpus can be used in Artificial Intelligence researches such as Natural Language Processing, Community Analysis, and so on. The NIFTY-Serve corpus differs from data on WWW in three ways; (1) essentially spam- and duplication-free because of strict data collection procedures, (2) historic user-generated data before WWW, and (3) a complete data set because the service now shut down. We also introduce some examples of use of the corpus.
Anthology ID:
W16-5408
Volume:
Proceedings of the 12th Workshop on Asian Language Resources (ALR12)
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Koiti Hasida, Kam-Fai Wong, Nicoletta Calzorari, Key-Sun Choi
Venue:
ALR
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
68–72
Language:
URL:
https://aclanthology.org/W16-5408
DOI:
Bibkey:
Cite (ACL):
Tomoya Iwakura, Tetsuro Takahashi, Akihiro Ohtani, and Kunio Matsui. 2016. Big Community Data before World Wide Web Era. In Proceedings of the 12th Workshop on Asian Language Resources (ALR12), pages 68–72, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Big Community Data before World Wide Web Era (Iwakura et al., ALR 2016)
Copy Citation:
PDF:
https://aclanthology.org/W16-5408.pdf