Issues and Perspectives from 10,000 Annotated Financial Social Media Data

Chung-Chi Chen, Hen-Hsen Huang, Hsin-Hsi Chen


Abstract
In this paper, we investigate the annotation of financial social media data from several angles. We present Fin-SoMe, a dataset with 10,000 labeled financial tweets annotated by experts from both the front desk and the middle desk in a bank’s treasury. These annotated results reveal that (1) writer-labeled market sentiment may be a misleading label; (2) writer’s sentiment and market sentiment of an investor may be different; (3) most financial tweets provide unfounded analysis results; and (4) almost no investors write down the gain/loss results for their positions, which would otherwise greatly facilitate detailed evaluation of their performance. Based on these results, we address various open problems and suggest possible directions for future work on financial social media data. We also provide an experiment on the key snippet extraction task to compare the performance of using a general sentiment dictionary and using the domain-specific dictionary. The results echo our findings from the experts’ annotations.
Anthology ID:
2020.lrec-1.749
Volume:
Proceedings of the 12th Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
6106–6110
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.749
DOI:
Bibkey:
Cite (ACL):
Chung-Chi Chen, Hen-Hsen Huang, and Hsin-Hsi Chen. 2020. Issues and Perspectives from 10,000 Annotated Financial Social Media Data. In Proceedings of the 12th Language Resources and Evaluation Conference, pages 6106–6110, Marseille, France. European Language Resources Association.
Cite (Informal):
Issues and Perspectives from 10,000 Annotated Financial Social Media Data (Chen et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.749.pdf