Jingguang Han

Also published as: Jing Guang Han


2018

pdf bib
Multi-Level Structured Self-Attentions for Distantly Supervised Relation Extraction
Jinhua Du | Jingguang Han | Andy Way | Dadong Wan
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Attention mechanism is often used in deep neural networks for distantly supervised relation extraction (DS-RE) to distinguish valid from noisy instances. However, traditional 1-D vector attention model is insufficient for learning of different contexts in the selection of valid instances to predict the relationship for an entity pair. To alleviate this issue, we propose a novel multi-level structured (2-D matrix) self-attention mechanism for DS-RE in a multi-instance learning (MIL) framework using bidirectional recurrent neural networks (BiRNN). In the proposed method, a structured word-level self-attention learns a 2-D matrix where each row vector represents a weight distribution for different aspects of an instance regarding two entities. Targeting the MIL issue, the structured sentence-level attention learns a 2-D matrix where each row vector represents a weight distribution on selection of different valid instances. Experiments conducted on two publicly available DS-RE datasets show that the proposed framework with multi-level structured self-attention mechanism significantly outperform baselines in terms of PR curves, P@N and F1 measures.

pdf bib
NextGen AML: Distributed Deep Learning based Language Technologies to Augment Anti Money Laundering Investigation
Jingguang Han | Utsab Barman | Jeremiah Hayes | Jinhua Du | Edward Burgin | Dadong Wan
Proceedings of ACL 2018, System Demonstrations

Most of the current anti money laundering (AML) systems, using handcrafted rules, are heavily reliant on existing structured databases, which are not capable of effectively and efficiently identifying hidden and complex ML activities, especially those with dynamic and time-varying characteristics, resulting in a high percentage of false positives. Therefore, analysts are engaged for further investigation which significantly increases human capital cost and processing time. To alleviate these issues, this paper presents a novel framework for the next generation AML by applying and visualizing deep learning-driven natural language processing (NLP) technologies in a distributed and scalable manner to augment AML monitoring and investigation. The proposed distributed framework performs news and tweet sentiment analysis, entity recognition, relation extraction, entity linking and link analysis on different data sources (e.g. news articles and tweets) to provide additional evidence to human investigators for final decision-making. Each NLP module is evaluated on a task-specific data set, and the overall experiments are performed on synthetic and real-world datasets. Feedback from AML practitioners suggests that our system can reduce approximately 30% time and cost compared to their previous manual approaches of AML investigation.

2012

pdf bib
The Herme Database of Spontaneous Multimodal Human-Robot Dialogues
Jing Guang Han | Emer Gilmartin | Celine De Looze | Brian Vaughan | Nick Campbell
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper presents methodologies and tools for language resource (LR) construction. It describes a database of interactive speech collected over a three-month period at the Science Gallery in Dublin, where visitors could take part in a conversation with a robot. The system collected samples of informal, chatty dialogue -- normally difficult to capture under laboratory conditions for human-human dialogue, and particularly so for human-machine interaction. The conversations were based on a script followed by the robot consisting largely of social chat with some task-based elements. The interactions were audio-visually recorded using several cameras together with microphones. As part of the conversation the participants were asked to sign a consent form giving permission to use their data for human-machine interaction research. The multimodal corpus will be made available to interested researchers and the technology developed during the three-month exhibition is being extended for use in education and assisted-living applications.