Milad Shokouhi


2024

pdf bib
“One-Size-Fits-All”? Examining Expectations around What Constitute “Fair” or “Good” NLG System Behaviors
Li Lucy | Su Lin Blodgett | Milad Shokouhi | Hanna Wallach | Alexandra Olteanu
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Fairness-related assumptions about what constitute appropriate NLG system behaviors range from invariance, where systems are expected to behave identically for social groups, to adaptation, where behaviors should instead vary across them. To illuminate tensions around invariance and adaptation, we conduct five case studies, in which we perturb different types of identity-related language features (names, roles, locations, dialect, and style) in NLG system inputs. Through these cases studies, we examine people’s expectations of system behaviors, and surface potential caveats of these contrasting yet commonly held assumptions. We find that motivations for adaptation include social norms, cultural differences, feature-specific information, and accommodation; in contrast, motivations for invariance include perspectives that favor prescriptivism, view adaptation as unnecessary or too difficult for NLG systems to do appropriately, and are wary of false assumptions. Our findings highlight open challenges around what constitute “fair” or “good” NLG system behaviors.

2023

pdf bib
PREME: Preference-based Meeting Exploration through an Interactive Questionnaire
Negar Arabzadeh | Ali Ahmadvand | Julia Kiseleva | Yang Liu | Ahmed Hassan Awadallah | Ming Zhong | Milad Shokouhi
Findings of the Association for Computational Linguistics: EACL 2023

The recent increase in the volume of online meetings necessitates automated tools for organizing the material, especially when an attendee has missed the discussion and needs assistance in quickly exploring it. In this work, we propose a novel end-to-end framework for generating interactive questionnaires for preference-based meeting exploration. As a result, users are supplied with a list of suggested questions reflecting their preferences. Since the task is new, we introduce an automatic evaluation strategy by measuring how much the generated questions via questionnaire are answerable to ensure factual correctness and covers the source meeting for the depth of possible exploration.

pdf bib
Robustness Challenges in Model Distillation and Pruning for Natural Language Understanding
Mengnan Du | Subhabrata Mukherjee | Yu Cheng | Milad Shokouhi | Xia Hu | Ahmed Hassan Awadallah
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Recent work has focused on compressing pre-trained language models (PLMs) like BERT where the major focus has been to improve the in-distribution performance for downstream tasks. However, very few of these studies have analyzed the impact of compression on the generalizability and robustness of compressed models for out-of-distribution (OOD) data. Towards this end, we study two popular model compression techniques including knowledge distillation and pruning and show that the compressed models are significantly less robust than their PLM counterparts on OOD test sets although they obtain similar performance on in-distribution development sets for a task. Further analysis indicates that the compressed models overfit on the shortcut samples and generalize poorly on the hard ones. We further leverage this observation to develop a regularization strategy for robust model compression based on sample uncertainty.

2022

pdf bib
UserIdentifier: Implicit User Representations for Simple and Effective Personalized Sentiment Analysis
Fatemehsadat Mireshghallah | Vaishnavi Shrivastava | Milad Shokouhi | Taylor Berg-Kirkpatrick | Robert Sim | Dimitrios Dimitriadis
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Global models are typically trained to be as generalizable as possible. Invariance to the specific user is considered desirable since models are shared across multitudes of users. However, these models are often unable to produce personalized responses for individual users, based on their data. Contrary to widely-used personalization techniques based on few-shot and meta-learning, we propose UserIdentifier, a novel scheme for training a single shared model for all users. Our approach produces personalized responses by prepending a fixed, user-specific non-trainable string (called “user identifier”) to each user’s input text. Unlike prior work, this method doesn’t need any additional model parameters, any extra rounds of personal few-shot learning or any change made to the vocabulary. We empirically study different types of user identifiers (numeric, alphanumeric, and also randomly generated) and demonstrate that, surprisingly, randomly generated user identifiers outperform the prefix-tuning based state-of-the-art approach by up to 13, on a suite of sentiment analysis datasets.

2021

pdf bib
A Dataset and Baselines for Multilingual Reply Suggestion
Mozhi Zhang | Wei Wang | Budhaditya Deb | Guoqing Zheng | Milad Shokouhi | Ahmed Hassan Awadallah
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Reply suggestion models help users process emails and chats faster. Previous work only studies English reply suggestion. Instead, we present MRS, a multilingual reply suggestion dataset with ten languages. MRS can be used to compare two families of models: 1) retrieval models that select the reply from a fixed set and 2) generation models that produce the reply from scratch. Therefore, MRS complements existing cross-lingual generalization benchmarks that focus on classification and sequence labeling tasks. We build a generation model and a retrieval model as baselines for MRS. The two models have different strengths in the monolingual setting, and they require different strategies to generalize across languages. MRS is publicly available at https://github.com/zhangmozhi/mrs.

pdf bib
MetaXL: Meta Representation Transformation for Low-resource Cross-lingual Learning
Mengzhou Xia | Guoqing Zheng | Subhabrata Mukherjee | Milad Shokouhi | Graham Neubig | Ahmed Hassan Awadallah
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

The combination of multilingual pre-trained representations and cross-lingual transfer learning is one of the most effective methods for building functional NLP systems for low-resource languages. However, for extremely low-resource languages without large-scale monolingual corpora for pre-training or sufficient annotated data for fine-tuning, transfer learning remains an understudied and challenging task. Moreover, recent work shows that multilingual representations are surprisingly disjoint across languages, bringing additional challenges for transfer onto extremely low-resource languages. In this paper, we propose MetaXL, a meta-learning based framework that learns to transform representations judiciously from auxiliary languages to a target one and brings their representation spaces closer for effective transfer. Extensive experiments on real-world low-resource languages – without access to large-scale monolingual corpora or large amounts of labeled data – for tasks like cross-lingual sentiment analysis and named entity recognition show the effectiveness of our approach. Code for MetaXL is publicly available at github.com/microsoft/MetaXL.

pdf bib
When does text prediction benefit from additional context? An exploration of contextual signals for chat and email messages
Stojan Trajanovski | Chad Atalla | Kunho Kim | Vipul Agarwal | Milad Shokouhi | Chris Quirk
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers

Email and chat communication tools are increasingly important for completing daily tasks. Accurate real-time phrase completion can save time and bolster productivity. Modern text prediction algorithms are based on large language models which typically rely on the prior words in a message to predict a completion. We examine how additional contextual signals (from previous messages, time, and subject) affect the performance of a commercial text prediction model. We compare contextual text prediction in chat and email messages from two of the largest commercial platforms Microsoft Teams and Outlook, finding that contextual signals contribute to performance differently between these scenarios. On emails, time context is most beneficial with small relative gains of 2% over baseline. Whereas, in chat scenarios, using a tailored set of previous messages as context yields relative improvements over the baseline between 9.3% and 18.6% across various critical service-oriented text prediction metrics.

pdf bib
Language Scaling for Universal Suggested Replies Model
Qianlan Ying | Payal Bajaj | Budhaditya Deb | Yu Yang | Wei Wang | Bojia Lin | Milad Shokouhi | Xia Song | Yang Yang | Daxin Jiang
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers

We consider the problem of scaling automated suggested replies for a commercial email application to multiple languages. Faced with increased compute requirements and low language resources for language expansion, we build a single universal model for improving the quality and reducing run-time costs of our production system. However, restricted data movement across regional centers prevents joint training across languages. To this end, we propose a multi-lingual multi-task continual learning framework, with auxiliary tasks and language adapters to train universal language representation across regions. The experimental results show positive cross-lingual transfer across languages while reducing catastrophic forgetting across regions. Our online results on real user traffic show significant CTR and Char-saved gain as well as 65% training cost reduction compared with per-language models. As a consequence, we have scaled the feature in multiple languages including low-resource markets.

pdf bib
A Conditional Generative Matching Model for Multi-lingual Reply Suggestion
Budhaditya Deb | Guoqing Zheng | Milad Shokouhi | Ahmed Hassan Awadallah
Findings of the Association for Computational Linguistics: EMNLP 2021

We study the problem of multilingual automated reply suggestions (RS) model serving many languages simultaneously. Multilingual models are often challenged by model capacity and severe data distribution skew across languages. While prior works largely focus on monolingual models, we propose Conditional Generative Matching models (CGM), optimized within a Variational Autoencoder framework to address challenges arising from multilingual RS. CGM does so with expressive message conditional priors, mixture densities to enhance multilingual data representation, latent alignment for language discrimination, and effective variational optimization techniques for training multilingual RS. The enhancements result in performance that exceed competitive baselines in relevance (ROUGE score) by more than 10% on average, and 16%for low resource languages. CGM also shows remarkable improvements in diversity (80%) illustrating its expressiveness in representation of multi-lingual data.

2019

pdf bib
Diversifying Reply Suggestions Using a Matching-Conditional Variational Autoencoder
Budhaditya Deb | Peter Bailey | Milad Shokouhi
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers)

We consider the problem of diversifying automated reply suggestions for a commercial instant-messaging (IM) system (Skype). Our conversation model is a standard matching based information retrieval architecture, which consists of two parallel encoders to project messages and replies into a common feature representation. During inference, we select replies from a fixed response set using nearest neighbors in the feature space. To diversify responses, we formulate the model as a generative latent variable model with Conditional Variational Auto-Encoder (M-CVAE). We propose a constrained-sampling approach to make the variational inference in M-CVAE efficient for our production system. In offline experiments, M-CVAE consistently increased diversity by ∼30−40% without significant impact on relevance. This translated to a ∼5% gain in click-rate in our online production system.