2022
pdf
bib
abs
E-NER — An Annotated Named Entity Recognition Corpus of Legal Text
Ting Wai Terence Au
|
Vasileios Lampos
|
Ingemar Cox
Proceedings of the Natural Legal Language Processing Workshop 2022
Identifying named entities such as a person, location or organization, in documents can highlight key information to readers. Training Named Entity Recognition (NER) models requires an annotated data set, which can be a time-consuming labour-intensive task. Nevertheless, there are publicly available NER data sets for general English. Recently there has been interest in developing NER for legal text. However, prior work and experimental results reported here indicate that there is a significant degradation in performance when NER methods trained on a general English data set are applied to legal text. We describe a publicly available legal NER data set, called E-NER, based on legal company filings available from the US Securities and Exchange Commission’s EDGAR data set. Training a number of different NER algorithms on the general English CoNLL-2003 corpus but testing on our test collection confirmed significant degradations in accuracy, as measured by the F1-score, of between 29.4% and 60.4%, compared to training and testing on the E-NER collection.
2018
pdf
bib
abs
Changes in Psycholinguistic Attributes of Social Media Users Before, During, and After Self-Reported Influenza Symptoms
Lucie Flekova
|
Vasileios Lampos
|
Ingemar Cox
Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task
Previous research has linked psychological and social variables to physical health. At the same time, psychological and social variables have been successfully predicted from the language used by individuals in social media. In this paper, we conduct an initial exploratory study linking these two areas. Using the social media platform of Twitter, we identify users self-reporting symptoms that are descriptive of influenza-like illness (ILI). We analyze the tweets of those users in the periods before, during, and after the reported symptoms, exploring emotional, cognitive, and structural components of language. We observe a post-ILI increase in social activity and cognitive processes, possibly supporting previous offline findings linking more active social activities and stronger cognitive coping skills to a better immune status.
2015
pdf
bib
An analysis of the user occupational class through Twitter content
Daniel Preoţiuc-Pietro
|
Vasileios Lampos
|
Nikolaos Aletras
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
2014
pdf
bib
Extracting Socioeconomic Patterns from the News: Modelling Text and Outlet Importance Jointly
Vasileios Lampos
|
Daniel Preoţiuc-Pietro
|
Sina Samangooei
|
Douwe Gelling
|
Trevor Cohn
Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science
pdf
bib
Predicting and Characterising User Impact on Twitter
Vasileios Lampos
|
Nikolaos Aletras
|
Daniel Preoţiuc-Pietro
|
Trevor Cohn
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics
2013
pdf
bib
A user-centric model of voting intention from Social Media
Vasileios Lampos
|
Daniel Preoţiuc-Pietro
|
Trevor Cohn
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)