A Pranav
Dayta AI
Also published as: Pranav A
Other people with similar names: Pranav Anand (UC Santa Cruz)
2025
Glitter: A Multi-Sentence, Multi-Reference Benchmark for Gender-Fair German Machine Translation
A Pranav | Janiça Hackenbuchner | Giuseppe Attanasio | Manuel Lardelli | Anne Lauscher
Findings of the Association for Computational Linguistics: EMNLP 2025
A Pranav | Janiça Hackenbuchner | Giuseppe Attanasio | Manuel Lardelli | Anne Lauscher
Findings of the Association for Computational Linguistics: EMNLP 2025
Machine translation (MT) research addressing gender inclusivity has gained attention for promoting non-exclusionary language representing all genders. However, existing resources are limited in size, most often consisting of single sentences, or single gender-fair formulation types, leaving questions about MT models’ ability to use context and diverse inclusive forms. We introduce Glitter, an English-German benchmark featuring extended passages with professional translations implementing three gender-fair alternatives: neutral rewording, typographical solutions (gender star), and neologistic forms (-ens forms). Our experiments reveal significant limitations in state-of-the-art language models, which default to masculine generics, struggle to interpret explicit gender cues in context, and rarely produce gender-fair translations. Through a systematic prompting analysis designed to elicit fair language, we demonstrate that these limitations stem from models’ fundamental misunderstanding of gender phenomena, as they fail to implement inclusive forms even when explicitly instructed. Glitter establishes a challenging benchmark, advancing research in gender-fair English-German MT. It highlights substantial room for improvement among leading models and can guide the development of future MT models capable of accurately representing gender diversity.
Proceedings of the Queer in AI Workshop
A Pranav | Alissa Valentine | Shaily Bhatt | Yanan Long | Arjun Subramonian | Amanda Bertsch | Anne Lauscher | Ankush Gupta
Proceedings of the Queer in AI Workshop
A Pranav | Alissa Valentine | Shaily Bhatt | Yanan Long | Arjun Subramonian | Amanda Bertsch | Anne Lauscher | Ankush Gupta
Proceedings of the Queer in AI Workshop
2024
Comparing Static and Contextual Distributional Semantic Models on Intrinsic Tasks: An Evaluation on Mandarin Chinese Datasets
Pranav A | Yan Cong | Emmanuele Chersoni | Yu-Yin Hsu | Alessandro Lenci
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Pranav A | Yan Cong | Emmanuele Chersoni | Yu-Yin Hsu | Alessandro Lenci
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
The field of Distributional Semantics has recently undergone important changes, with the contextual representations produced by Transformers taking the place of static word embeddings models. Noticeably, previous studies comparing the two types of vectors have only focused on the English language and a limited number of models. In our study, we present a comparative evaluation of static and contextualized distributional models for Mandarin Chinese, focusing on a range of intrinsic tasks. Our results reveal that static models remain stronger for some of the classical tasks that consider word meaning independent of context, while contextualized models excel in identifying semantic relations between word pairs and in the categorization of words into abstract semantic classes.
2023
Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation
Chu-Ren Huang | Yasunari Harada | Jong-Bok Kim | Si Chen | Yu-Yin Hsu | Emmanuele Chersoni | Pranav A | Winnie Huiheng Zeng | Bo Peng | Yuxi Li | Junlin Li
Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation
Chu-Ren Huang | Yasunari Harada | Jong-Bok Kim | Si Chen | Yu-Yin Hsu | Emmanuele Chersoni | Pranav A | Winnie Huiheng Zeng | Bo Peng | Yuxi Li | Junlin Li
Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation
2020
2kenize: Tying Subword Sequences for Chinese Script Conversion
Pranav A | Isabelle Augenstein
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Pranav A | Isabelle Augenstein
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Simplified Chinese to Traditional Chinese character conversion is a common preprocessing step in Chinese NLP. Despite this, current approaches have insufficient performance because they do not take into account that a simplified Chinese character can correspond to multiple traditional characters. Here, we propose a model that can disambiguate between mappings and convert between the two scripts. The model is based on subword segmentation, two language models, as well as a method for mapping between subword sequences. We further construct benchmark datasets for topic classification and script conversion. Our proposed method outperforms previous Chinese Character conversion approaches by 6 points in accuracy. These results are further confirmed in a downstream application, where 2kenize is used to convert pretraining dataset for topic classification. An error analysis reveals that our method’s particular strengths are in dealing with code mixing and named entities.
2018
Alignment Analysis of Sequential Segmentation of Lexicons to Improve Automatic Cognate Detection
Pranav A
Proceedings of ACL 2018, Student Research Workshop
Pranav A
Proceedings of ACL 2018, Student Research Workshop
Ranking functions in information retrieval are often used in search engines to extract the relevant answers to the query. This paper makes use of this notion of information retrieval and applies onto the problem domain of cognate detection. The main contributions of this paper are: (1) positional tokenization, which incorporates the sequential notion; (2) graphical error modelling, which calculates the morphological shifts. The current research work only distinguishes whether a pair of words are cognates or not. However, we also study if we could predict a possible cognate from the given input. Our study shows that language modelling based retrieval functions with positional tokenization and error modelling tend to give better results than competing baselines.