Shibamouli Lahiri

2017

Identifying Usage Expression Sentences in Consumer Product Reviews
Shibamouli Lahiri | V. G. Vinod Vydiswaran | Rada Mihalcea
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In this paper we introduce the problem of identifying usage expression sentences in a consumer product review. We create a human-annotated gold standard dataset of 565 reviews spanning five distinct product categories. Our dataset consists of more than 3,000 annotated sentences. We further introduce a classification system to label sentences according to whether or not they describe some “usage”. The system combines lexical, syntactic, and semantic features in a product-agnostic fashion to yield good classification performance. We show the effectiveness of our approach using importance ranking of features, error analysis, and cross-product classification experiments.

2016

pdf bib abs

Sentiment Analysis of Tweets in Three Indian Languages
Shanta Phani | Shibamouli Lahiri | Arindam Biswas
Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing (WSSANLP2016)

In this paper, we describe the results of sentiment analysis on tweets in three Indian languages – Bengali, Hindi, and Tamil. We used the recently released SAIL dataset (Patra et al., 2015), and obtained state-of-the-art results in all three languages. Our features are simple, robust, scalable, and language-independent. Further, we show that these simple features provide better results than more complex and language-specific features, in two separate classification tasks. Detailed feature analysis and error analysis have been reported, along with learning curves for Hindi and Bengali.

2015

pdf bib

Authorship Attribution in Bengali Language
Shanta Phani | Shibamouli Lahiri | Arindam Biswas
Proceedings of the 12th International Conference on Natural Language Processing

pdf bib

Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop
Diana Inkpen | Smaranda Muresan | Shibamouli Lahiri | Karen Mazidi | Alisa Zhila
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop

2014

bib abs

Building a Dataset for Summarization and Keyword Extraction from Emails
Vanessa Loza | Shibamouli Lahiri | Rada Mihalcea | Po-Hsiang Lai
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper introduces a new email dataset, consisting of both single and thread emails, manually annotated with summaries and keywords. A total of 349 emails and threads have been annotated. The dataset is our first step toward developing automatic methods for summarization and keyword extraction from emails. We describe the email corpus, along with the annotation interface, annotator guidelines, and agreement studies.

pdf bib

Complexity of Word Collocation Networks: A Preliminary Structural Analysis
Shibamouli Lahiri
Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics