Akihiro Ohtani
2016
Big Community Data before World Wide Web Era
Tomoya Iwakura
|
Tetsuro Takahashi
|
Akihiro Ohtani
|
Kunio Matsui
Proceedings of the 12th Workshop on Asian Language Resources (ALR12)
This paper introduces the NIFTY-Serve corpus, a large data archive collected from Japanese discussion forums that operated via a Bulletin Board System (BBS) between 1987 and 2006. This corpus can be used in Artificial Intelligence researches such as Natural Language Processing, Community Analysis, and so on. The NIFTY-Serve corpus differs from data on WWW in three ways; (1) essentially spam- and duplication-free because of strict data collection procedures, (2) historic user-generated data before WWW, and (3) a complete data set because the service now shut down. We also introduce some examples of use of the corpus.