Naphtali Rishe
2020
Improving the Identification of the Discourse Function of News Article Paragraphs
Deya Banisakher
|
W. Victor Yarlott
|
Mohammed Aldawsari
|
Naphtali Rishe
|
Mark Finlayson
Proceedings of the First Joint Workshop on Narrative Understanding, Storylines, and Events
Identifying the discourse structure of documents is an important task in understanding written text. Building on prior work, we demonstrate an improved approach to automatically identifying the discourse function of paragraphs in news articles. We start with the hierarchical theory of news discourse developed by van Dijk (1988) which proposes how paragraphs function within news articles. This discourse information is a level intermediate between phrase- or sentence-sized discourse segments and document genre, characterizing how individual paragraphs convey information about the events in the storyline of the article. Specifically, the theory categorizes the relationships between narrated events and (1) the overall storyline (such as Main Events, Background, or Consequences) as well as (2) commentary (such as Verbal Reactions and Evaluations). We trained and tested a linear chain conditional random field (CRF) with new features to model van Dijk’s labels and compared it against several machine learning models presented in previous work. Our model significantly outperformed all baselines and prior approaches, achieving an average of 0.71 F1 score which represents a 31.5% improvement over the previously best-performing support vector machine model.
2018
Automatically Detecting the Position and Type of Psychiatric Evaluation Report Sections
Deya Banisakher
|
Naphtali Rishe
|
Mark A. Finlayson
Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis
Psychiatric evaluation reports represent a rich and still mostly-untapped source of information for developing systems for automatic diagnosis and treatment of mental health problems. These reports contain free-text structured within sections using a convention of headings. We present a model for automatically detecting the position and type of different psychiatric evaluation report sections. We developed this model using a corpus of 150 sample reports that we gathered from the Web, and used sentences as a processing unit while section headings were used as labels of section type. From these labels we generated a unified hierarchy of labels of section types, and then learned n-gram models of the language found in each section. To model conventions for section order, we integrated these n-gram models with a Hierarchical Hidden Markov Model (HHMM) representing the probabilities of observed section orders found in the corpus, and then used this HHMM n-gram model in a decoding framework to infer the most likely section boundaries and section types for documents with their section labels removed. We evaluated our model over two tasks, namely, identifying section boundaries and identifying section types and orders. Our model significantly outperformed baselines for each task with an F1 of 0.88 for identifying section types, and a 0.26 WindowDiff (Wd) and 0.20 and (Pk) scores, respectively, for identifying section boundaries.
Search