Royal Sequiera
2024
LegalLens 2024 Shared Task: Masala-chai Submission
Khalid Rajan
|
Royal Sequiera
Proceedings of the Natural Legal Language Processing Workshop 2024
In this paper, we present the masala-chai team’s participation in the LegalLens 2024 shared task and detail our approach to predicting legal entities and performing natural language inference (NLI) in the legal domain. We experimented with various transformer-based models, including BERT, RoBERTa, Llama 3.1, and GPT-4o. Our results show that state-of-the-art models like GPT-4o underperformed in NER and NLI tasks, even when using advanced techniques such as bootstrapping and prompt optimization. The best performance in NER (accuracy: 0.806, F1 macro: 0.701) was achieved with a fine-tuned RoBERTa model, while the highest NLI results (accuracy: 0.825, F1 macro: 0.833) came from a fine-tuned Llama 3.1 8B model. Notably, RoBERTa, despite having significantly fewer parameters than Llama 3.1 8B, delivered comparable results. We discuss key findings and insights from our experiments and provide our results and code for reproducibility and further analysis at https://github.com/rosequ/masala-chai
2017
Estimating Code-Switching on Twitter with a Novel Generalized Word-Level Language Detection Technique
Shruti Rijhwani
|
Royal Sequiera
|
Monojit Choudhury
|
Kalika Bali
|
Chandra Shekhar Maddila
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Word-level language detection is necessary for analyzing code-switched text, where multiple languages could be mixed within a sentence. Existing models are restricted to code-switching between two specific languages and fail in real-world scenarios as text input rarely has a priori information on the languages used. We present a novel unsupervised word-level language detection technique for code-switched text for an arbitrarily large number of languages, which does not require any manually annotated training data. Our experiments with tweets in seven languages show a 74% relative error reduction in word-level labeling with respect to competitive baselines. We then use this system to conduct a large-scale quantitative analysis of code-switching patterns on Twitter, both global as well as region-specific, with 58M tweets.
2015
POS Tagging of Hindi-English Code Mixed Text from Social Media: Some Machine Learning Experiments
Royal Sequiera
|
Monojit Choudhury
|
Kalika Bali
Proceedings of the 12th International Conference on Natural Language Processing