Avinesh PVS
2018
A Retrospective Analysis of the Fake News Challenge Stance-Detection Task
Andreas Hanselowski | Avinesh PVS | Benjamin Schiller | Felix Caspelherr | Debanjan Chaudhuri | Christian M. Meyer | Iryna Gurevych
Proceedings of the 27th International Conference on Computational Linguistics
Andreas Hanselowski | Avinesh PVS | Benjamin Schiller | Felix Caspelherr | Debanjan Chaudhuri | Christian M. Meyer | Iryna Gurevych
Proceedings of the 27th International Conference on Computational Linguistics
The 2017 Fake News Challenge Stage 1 (FNC-1) shared task addressed a stance classification task as a crucial first step towards detecting fake news. To date, there is no in-depth analysis paper to critically discuss FNC-1’s experimental setup, reproduce the results, and draw conclusions for next-generation stance classification methods. In this paper, we provide such an in-depth analysis for the three top-performing systems. We first find that FNC-1’s proposed evaluation metric favors the majority class, which can be easily classified, and thus overestimates the true discriminative power of the methods. Therefore, we propose a new F1-based metric yielding a changed system ranking. Next, we compare the features and architectures used, which leads to a novel feature-rich stacked LSTM model that performs on par with the best systems, but is superior in predicting minority classes. To understand the methods’ ability to generalize, we derive a new dataset and perform both in-domain and cross-domain experiments. Our qualitative and quantitative study helps interpreting the original FNC-1 scores and understand which features help improving performance and why. Our new dataset and all source code used during the reproduction study are publicly available for future research.
2014
ThinkMiners: Disorder Recognition using Conditional Random Fields and Distributional Semantics
Ankur Parikh | Avinesh PVS | Joy Mustafi | Lalit Agarwalla | Ashish Mungi
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)
Ankur Parikh | Avinesh PVS | Joy Mustafi | Lalit Agarwalla | Ashish Mungi
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)
2011
Transferring Syntactic Relations from English to Hindi Using Alignments on Local Word Groups
Aswarth Dara | Prashanth Mannem | Hemanth Sagar Bayyarapu | Avinesh PVS
Proceedings of 5th International Joint Conference on Natural Language Processing
Aswarth Dara | Prashanth Mannem | Hemanth Sagar Bayyarapu | Avinesh PVS
Proceedings of 5th International Joint Conference on Natural Language Processing
2010
Phrase Based Decoding using a Discriminative Model
Prasanth Kolachina | Sriram Venkatapathy | Srinivas Bangalore | Sudheer Kolachina | Avinesh PVS
Proceedings of the 4th Workshop on Syntax and Structure in Statistical Translation
Prasanth Kolachina | Sriram Venkatapathy | Srinivas Bangalore | Sudheer Kolachina | Avinesh PVS
Proceedings of the 4th Workshop on Syntax and Structure in Statistical Translation
A Corpus Factory for Many Languages
Adam Kilgarriff | Siva Reddy | Jan Pomikálek | Avinesh PVS
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Adam Kilgarriff | Siva Reddy | Jan Pomikálek | Avinesh PVS
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
For many languages there are no large, general-language corpora available. Until the web, all but the institutions could do little but shake their heads in dismay as corpus-building was long, slow and expensive. But with the advent of the Web it can be highly automated and thereby fast and inexpensive. We have developed a corpus factory where we build large corpora. In this paper we describe the method we use, and how it has worked, and how various problems were solved, for eight languages: Dutch, Hindi, Indonesian, Norwegian, Swedish, Telugu, Thai and Vietnamese. We use the BootCaT method: we take a set of 'seed words' for the language from Wikipedia. Then, several hundred times over, we * randomly select three or four of the seed words * send as a query to Google or Yahoo or Bing, which returns a 'search hits' page * gather the pages that Google or Yahoo point to and save the text. This forms the corpus, which we then * 'clean' (to remove navigation bars, advertisements etc) * remove duplicates * tokenise and (if tools are available) lemmatise and part-of-speech tag * load into our corpus query tool, the Sketch Engine The corpora we have developed are available for use in the Sketch Engine corpus query tool.