Anna Tigunova

2025

FABRIC: Fully-Automated Broad Intent Categorization in E-commerce
Anna Tigunova | Philipp Schmidt | Damla Ezgi Akcora
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

Predicting the user’s shopping intent is a crucial task in e-commerce. In particular determining the product category, which the user wants to shop, is essential for delivering relevant search results and website navigation options. Existing query classification models are reported to have excellent predictive performanceon the single-intent queries (e.g. ‘running shoes’), but there is only little research on predicting multiple-intents for a broad query (e.g.‘running gear’). Although the training data for broad query classification can be easily obtained, the evaluation of multi-label categorization remains challenging, as the set of true labels for multi-intent queries is subjective and ambiguous. In this work we propose an automatic method of creating the evaluation data for multi-label e-commerce query classification. We reduce the ambiguity of the annotations by blending the label assessment from three different sources: user click data, query-item relevance and LLM judgments.

2021

pdf bib abs

PRIDE: Predicting Relationships in Conversations
Anna Tigunova | Paramita Mirza | Andrew Yates | Gerhard Weikum
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Automatically extracting interpersonal relationships of conversation interlocutors can enrich personal knowledge bases to enhance personalized search, recommenders and chatbots. To infer speakers’ relationships from dialogues we propose PRIDE, a neural multi-label classifier, based on BERT and Transformer for creating a conversation representation. PRIDE utilizes dialogue structure and augments it with external knowledge about speaker features and conversation style. Unlike prior works, we address multi-label prediction of fine-grained relationships. We release large-scale datasets, based on screenplays of movies and TV shows, with directed relationships of conversation participants. Extensive experiments on both datasets show superior performance of PRIDE compared to the state-of-the-art baselines.

2020

pdf bib abs

CHARM: Inferring Personal Attributes from Conversations
Anna Tigunova | Andrew Yates | Paramita Mirza | Gerhard Weikum
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Personal knowledge about users’ professions, hobbies, favorite food, and travel preferences, among others, is a valuable asset for individualized AI, such as recommenders or chatbots. Conversations in social media, such as Reddit, are a rich source of data for inferring personal facts. Prior work developed supervised methods to extract this knowledge, but these approaches can not generalize beyond attribute values with ample labeled training samples. This paper overcomes this limitation by devising CHARM: a zero-shot learning method that creatively leverages keyword extraction and document retrieval in order to predict attribute values that were never seen during training. Experiments with large datasets from Reddit show the viability of CHARM for open-ended attributes, such as professions and hobbies.

pdf bib abs

RedDust: a Large Reusable Dataset of Reddit User Traits
Anna Tigunova | Paramita Mirza | Andrew Yates | Gerhard Weikum
Proceedings of the Twelfth Language Resources and Evaluation Conference

Social media is a rich source of assertions about personal traits, such as “I am a doctor” or “my hobby is playing tennis”. Precisely identifying explicit assertions is difficult, though, because of the users’ highly varied vocabulary and language expressions. Identifying personal traits from implicit assertions like I’ve been at work treating patients all day is even more challenging. This paper presents RedDust, a large-scale annotated resource for user profiling for over 300k Reddit users across five attributes: profession, hobby, family status, age,and gender. We construct RedDust using a diverse set of high-precision patterns and demonstrate its use as a resource for developing learning models to deal with implicit assertions. RedDust consists of users’ personal traits, which are (attribute, value) pairs, along with users’ post ids, which may be used to retrieve the posts from a publicly available crawl or from the Reddit API. We discuss the construction of the resource and show interesting statistics and insights into the data. We also compare different classifiers, which can be learned from RedDust. To the best of our knowledge, RedDust is the first annotated language resource about Reddit users at large scale. We envision further use cases of RedDust for providing background knowledge about user traits, to enhance personalized search and recommendation as well as conversational agents.

Co-authors

Venues

EMNLP3
LREC1

Fix author