2023
pdf
bib
abs
ResearchTeam_HCN at SemEval-2023 Task 6: A knowledge enhanced transformers based legal NLP system
Dhanachandra Ningthoujam
|
Pinal Patel
|
Rajkamal Kareddula
|
Ramanand Vangipuram
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
This paper presents our work on LegalEval (understanding legal text), one of the tasks in SemEval-2023. It comprises of three sub-tasks namely Rhetorical Roles (RR), Legal Named Entity Recognition (L-NER), and Court Judge- ment Prediction with Explanation (CJPE). We developed different deep-learning models for each sub-tasks. For RR, we developed a multi- task learning model with contextual sequential sentence classification as the main task and non- contextual single sentence prediction as the sec- ondary task. Our model achieved an F1-score of 76.50% on the unseen test set, and we at- tained the 14th position on the leaderboard. For the L-NER problem, we have designed a hybrid model, consisting of a multi-stage knowledge transfer learning framework and a rule-based system. This model achieved an F1-score of 91.20% on the blind test set and attained the top position on the final leaderboard. Finally, for the CJPE task, we used a hierarchical ap- proach and could get around 66.67% F1-score on judgment prediction and 45.83% F1-score on the explainability of the CJPE task, and we attained 8th position on the leaderboard for this sub-task.
pdf
bib
abs
Rahul Patil at SemEval-2023 Task 1: V-WSD: Visual Word Sense Disambiguation
Rahul Patil
|
Pinal Patel
|
Charin Patel
|
Mangal Verma
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
Semeval 2023 task 1: VWSD, In this paper, we propose an ensemble of two Neural network systems that ranks 10 images given a word and limited textual context. We have used openAI Clip based models for the English language and multilingual text-to-text translation models for Farsi-to-English and Italian-to-English. Additionally, we propose a system that learns from multilingual bert-base embeddings for text and resnet101 embeddings for the image. Considering all the three languages into account this system has achieved the fourth rank.
2018
pdf
bib
abs
A Treebank for the Healthcare Domain
Nganthoibi Oinam
|
Diwakar Mishra
|
Pinal Patel
|
Narayan Choudhary
|
Hitesh Desai
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)
This paper presents a treebank for the healthcare domain developed at ezDI. The treebank is created from a wide array of clinical health record documents across hospitals. The data has been de-identified and annotated for constituent syntactic structure. The treebank contains a total of 52053 sentences that have been sampled for subdomains as well as linguistic variations. The paper outlines the sampling process followed to ensure a better domain representation in the corpus, the annotation process and challenges, and corpus statistics. The Penn Treebank tagset and guidelines were largely followed, but there were many syntactic contexts that warranted adaptation of the guidelines. The treebank created was used to re-train the Berkeley parser and the Stanford parser. These parsers were also trained with the GENIA treebank for comparative quality assessment. Our treebank yielded great-er accuracy on both parsers. Berkeley parser performed better on our treebank with an average F1 measure of 91 across 5-folds. This was a significant jump from the out-of-the-box F1 score of 70 on Berkeley parser’s default grammar.
pdf
bib
abs
Annotation of a Large Clinical Entity Corpus
Pinal Patel
|
Disha Davey
|
Vishal Panchal
|
Parth Pathak
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Having an entity annotated corpus of the clinical domain is one of the basic requirements for detection of clinical entities using machine learning (ML) approaches. Past researches have shown the superiority of statistical/ML approaches over the rule based approaches. But in order to take full advantage of the ML approaches, an accurately annotated corpus becomes an essential requirement. Though there are a few annotated corpora available either on a small data set, or covering a narrower domain (like cancer patients records, lab reports), annotation of a large data set representing the entire clinical domain has not been created yet. In this paper, we have described in detail the annotation guidelines, annotation process and our approaches in creating a CER (clinical entity recognition) corpus of 5,160 clinical documents from forty different clinical specialities. The clinical entities range across various types such as diseases, procedures, medications, medical devices and so on. We have classified them into eleven categories for annotation. Our annotation also reflects the relations among the group of entities that constitute larger concepts altogether.
2015
pdf
bib
ezDI: A Supervised NLP System for Clinical Narrative Analysis
Parth Pathak
|
Pinal Patel
|
Vishal Panchal
|
Sagar Soni
|
Kinjal Dani
|
Amrish Patel
|
Narayan Choudhary
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)
2014
pdf
bib
ezDI: A Hybrid CRF and SVM based Model for Detecting and Encoding Disorder Mentions in Clinical Notes
Parth Pathak
|
Pinal Patel
|
Vishal Panchal
|
Narayan Choudhary
|
Amrish Patel
|
Gautam Joshi
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)
pdf
bib
Annotating a Large Representative Corpus of Clinical Notes for Parts of Speech
Narayan Choudhary
|
Parth Pathak
|
Pinal Patel
|
Vishal Panchal
Proceedings of LAW VIII - The 8th Linguistic Annotation Workshop