Reuben Dias
2017
Making Travel Smarter: Extracting Travel Information From Email Itineraries Using Named Entity Recognition
Divyansh Kaushik
|
Shashank Gupta
|
Chakradhar Raju
|
Reuben Dias
|
Sanjib Ghosh
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017
The purpose of this research is to address the problem of extracting information from travel itineraries and discuss the challenges faced in the process. Business-to-customer emails like booking confirmations and e-tickets are usually machine generated by filling slots in pre-defined templates which improve the presentation of such emails but also make the emails more complex in structure. Extracting the relevant information from these emails would let users track their journeys and important updates on applications installed on their devices to give them a consolidated over view of their itineraries and also save valuable time. We investigate the use of an HMM-based named entity recognizer on such emails which we will use to label and extract relevant entities. NER in such emails is challenging as these itineraries offer less useful contextual information. We also propose a rich set of features which are integrated into the model and are specific to our domain. The result from our model is a list of lists containing the relevant information extracted from ones itinerary.