Proceedings of the 17th International Conference on Natural Language Processing (ICON): TermTraction 2020 Shared Task

Dipti Misra Sharma, Asif Ekbal, Karunesh Arora, Sudip Kumar Naskar, Dipankar Ganguly, Sobha L, Radhika Mamidi, Sunita Arora, Pruthwik Mishra, Vandan Mujadia (Editors)

Anthology ID:: 2020.icon-termtraction
Month:: December
Year:: 2020
Address:: Patna, India
Venue:: ICON
Event:: International Conference on Natural Language Processing (2020)
SIG:
Publisher:: NLP Association of India (NLPAI)
URL:: https://aclanthology.org/2020.icon-termtraction/
DOI:
Bib Export formats:: BibTeX MODS XML EndNote
PDF:: https://aclanthology.org/2020.icon-termtraction.pdf

PDF (full) BibTeX Search

pdf bib

pdf bib abs

Graph Based Automatic Domain Term Extraction
Hema Ala | Dipti Sharma

We present a Graph Based Approach to automatically extract domain specific terms from technical domains like Biochemistry, Communication, Computer Science and Law. Our approach is similar to TextRank with an extra post-processing step to reduce the noise. We performed our experiments on the mentioned domains provided by ICON TermTraction - 2020 shared task. Presented precision, recall and f1-score for all experiments. Further, it is observed that our method gives promising results without much noise in domain terms.

pdf bib abs

Unsupervised Technical Domain Terms Extraction using Term Extractor
Suman Dowlagar | Radhika Mamidi

Terminology extraction, also known as term extraction, is a subtask of information extraction. The goal of terminology extraction is to extract relevant words or phrases from a given corpus automatically. This paper focuses on the unsupervised automated domain term extraction method that considers chunking, preprocessing, and ranking domain-specific terms using relevance and cohesion functions for ICON 2020 shared task 2: TermTraction.

pdf bib abs

N-Grams TextRank A Novel Domain Keyword Extraction Technique
Saransh Rajput | Akshat Gahoi | Manvith Reddy | Dipti Mishra Sharma

The rapid growth of the internet has given us a wealth of information and data spread across the web. However, as the data begins to grow we simultaneously face the grave problem of an Information Explosion. An abundance of data can lead to large scale data management problems as well as the loss of the true meaning of the data. In this paper, we present an advanced domain specific keyword extraction algorithm in order to tackle this problem of paramount importance. Our algorithm is based on a modified version of TextRank algorithm - an algorithm based on PageRank to successfully determine the keywords from a domain specific document. Furthermore, this paper proposes a modification to the traditional TextRank algorithm that takes into account bigrams and trigrams and returns results with an extremely high precision. We observe how the precision and f1-score of this model outperforms other models in many domains and the recall can be easily increased by increasing the number of results without affecting the precision. We also discuss about the future work of extending the same algorithm to Indian languages.