A Tangled Web: The Faint Signals of Deception in Text - Boulder Lies and Truth Corpus (BLT-C)

Franco Salvetti, John B. Lowe, James H. Martin


Abstract
We present an approach to creating corpora for use in detecting deception in text, including a discussion of the challenges peculiar to this task. Our approach is based on soliciting several types of reviews from writers and was implemented using Amazon Mechanical Turk. We describe the multi-dimensional corpus of reviews built using this approach, available free of charge from LDC as the Boulder Lies and Truth Corpus (BLT-C). Challenges for both corpus creation and the deception detection include the fact that human performance on the task is typically at chance, that the signal is faint, that paid writers such as turkers are sometimes deceptive, and that deception is a complex human behavior; manifestations of deception depend on details of domain, intrinsic properties of the deceiver (such as education, linguistic competence, and the nature of the intention), and specifics of the deceptive act (e.g., lying vs. fabricating.) To overcome the inherent lack of ground truth, we have developed a set of semi-automatic techniques to ensure corpus validity. We present some preliminary results on the task of deception detection which suggest that the BLT-C is an improvement in the quality of resources available for this task.
Anthology ID:
L16-1558
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3510–3517
Language:
URL:
https://aclanthology.org/L16-1558
DOI:
Bibkey:
Cite (ACL):
Franco Salvetti, John B. Lowe, and James H. Martin. 2016. A Tangled Web: The Faint Signals of Deception in Text - Boulder Lies and Truth Corpus (BLT-C). In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 3510–3517, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
A Tangled Web: The Faint Signals of Deception in Text - Boulder Lies and Truth Corpus (BLT-C) (Salvetti et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1558.pdf