Swayatta Daw
2022
Multilinguals at SemEval-2022 Task 11: Complex NER in Semantically Ambiguous Settings for Low Resource Languages
Amit Pandey
|
Swayatta Daw
|
Narendra Unnam
|
Vikram Pudi
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
We leverage pre-trained language models to solve the task of complex NER for two low-resource languages: Chinese and Spanish. We use the technique of Whole Word Masking (WWM) to boost the performance of masked language modeling objective on large and unsupervised corpora. We experiment with multiple neural network architectures, incorporating CRF, BiLSTMs, and Linear Classifiers on top of a fine-tuned BERT layer. All our models outperform the baseline by a significant margin and our best performing model obtains a competitive position on the evaluation leaderboard for the blind test set.
Multilinguals at SemEval-2022 Task 11: Transformer Based Architecture for Complex NER
Amit Pandey
|
Swayatta Daw
|
Vikram Pudi
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
We investigate the task of complex NER for the English language. The task is non-trivial due to the semantic ambiguity of the textual structure and the rarity of occurrence of such entities in the prevalent literature. Using pre-trained language models such as BERT, we obtain a competitive performance on this task. We qualitatively analyze the performance of multiple architectures for this task. All our models are able to outperform the baseline by a significant margin. Our best performing model beats the baseline F1-score by over 9%.
2021
Cross-lingual Alignment of Knowledge Graph Triples with Sentences
Swayatta Daw
|
Shivprasad Sagare
|
Tushar Abhishek
|
Vikram Pudi
|
Vasudeva Varma
Proceedings of the 18th International Conference on Natural Language Processing (ICON)
The pairing of natural language sentences with knowledge graph triples is essential for many downstream tasks like data-to-text generation, facts extraction from sentences (semantic parsing), knowledge graph completion, etc. Most existing methods solve these downstream tasks using neural-based end-to-end approaches that require a large amount of well-aligned training data, which is difficult and expensive to acquire. Recently various unsupervised techniques have been proposed to alleviate this alignment step by automatically pairing the structured data (knowledge graph triples) with textual data. However, these approaches are not well suited for low resource languages that provide two major challenges: (1) unavailability of pair of triples and native text with the same content distribution and (2) limited Natural language Processing (NLP) resources. In this paper, we address the unsupervised pairing of knowledge graph triples with sentences for low resource languages, selecting Hindi as the low resource language. We propose cross-lingual pairing of English triples with Hindi sentences to mitigate the unavailability of content overlap. We propose two novel approaches: NER-based filtering with Semantic Similarity and Key-phrase Extraction with Relevance Ranking. We use our best method to create a collection of 29224 well-aligned English triples and Hindi sentence pairs. Additionally, we have also curated 350 human-annotated golden test datasets for evaluation. We make the code and dataset publicly available.
Search
Co-authors
- Vikram Pudi 3
- Amit Pandey 2
- Shivprasad Sagare 1
- Tushar Abhishek 1
- Vasudeva Varma 1
- show all...