Building a Dataset for Possessions Identification in Text

Carmen Banea, Xi Chen, Rada Mihalcea


Abstract
Just as industrialization matured from mass production to customization and personalization, so has the Web migrated from generic content to public disclosures of one’s most intimately held thoughts, opinions and beliefs. This relatively new type of data is able to represent finer and more narrowly defined demographic slices. If until now researchers have primarily focused on leveraging personalized content to identify latent information such as gender, nationality, location, or age of the author, this study seeks to establish a structured way of extracting possessions, or items that people own or are entitled to, as a way to ultimately provide insights into people’s behaviors and characteristics. In order to promote more research in this area, we are releasing a set of 798 possessions extracted from blog genre, where possessions are marked at different confidence levels, as well as a detailed set of guidelines to help in future annotation studies.
Anthology ID:
L16-1592
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3737–3740
Language:
URL:
https://aclanthology.org/L16-1592
DOI:
Bibkey:
Cite (ACL):
Carmen Banea, Xi Chen, and Rada Mihalcea. 2016. Building a Dataset for Possessions Identification in Text. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 3737–3740, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Building a Dataset for Possessions Identification in Text (Banea et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1592.pdf