Development of Hybrid Algorithm for Automatic Extraction of Multiword Expressions from Monolingual and Parallel Corpus of English and Punjabi

Kapil Dev Goyal, Vishal Goyal


Abstract
Identification and extraction of Multiword Expressions (MWEs) is very hard and challenging task in various Natural Language processing applications like Information Retrieval (IR), Information Extraction (IE), Question-Answering systems, Speech Recognition and Synthesis, Text Summarization and Machine Translation (MT). Multiword Expressions are two or more consecutive words but treated as a single word and actual meaning this expression cannot be extracted from meaning of individual word. If any systems recognized this expression as separate words, then results of system will be incorrect. Therefore it is mandatory to identify these expressions to improve the result of the system. In this report, our main focus is to develop an automated tool to extract Multiword Expressions from monolingual and parallel corpus of English and Punjabi. In this tool, Rule based approach, Linguistic approach, statistical approach, and many more approaches were used to identify and extract MWEs from monolingual and parallel corpus of English and Punjabi and achieved more than 90% f-score value in some types of MWEs.
Anthology ID:
2020.icon-demos.2
Volume:
Proceedings of the 17th International Conference on Natural Language Processing (ICON): System Demonstrations
Month:
DECEMBER
Year:
2020
Address:
Patna, India
Editors:
Vishal Goyal, Asif Ekbal
Venue:
ICON
SIG:
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
4–6
Language:
URL:
https://aclanthology.org/2020.icon-demos.2
DOI:
Bibkey:
Cite (ACL):
Kapil Dev Goyal and Vishal Goyal. 2020. Development of Hybrid Algorithm for Automatic Extraction of Multiword Expressions from Monolingual and Parallel Corpus of English and Punjabi. In Proceedings of the 17th International Conference on Natural Language Processing (ICON): System Demonstrations, pages 4–6, Patna, India. NLP Association of India (NLPAI).
Cite (Informal):
Development of Hybrid Algorithm for Automatic Extraction of Multiword Expressions from Monolingual and Parallel Corpus of English and Punjabi (Goyal & Goyal, ICON 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.icon-demos.2.pdf