Yufan Guo


2021

pdf bib
When and Why a Model Fails? A Human-in-the-loop Error Detection Framework for Sentiment Analysis
Zhe Liu | Yufan Guo | Jalal Mahmud
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers

Although deep neural networks have been widely employed and proven effective in sentiment analysis tasks, it remains challenging for model developers to assess their models for erroneous predictions that might exist prior to deployment. Once deployed, emergent errors can be hard to identify in prediction run-time and impossible to trace back to their sources. To address such gaps, in this paper we propose an error detection framework for sentiment analysis based on explainable features. We perform global-level feature validation with human-in-the-loop assessment, followed by an integration of global and local-level feature contribution analysis. Experimental results show that, given limited human-in-the-loop intervention, our method is able to identify erroneous model predictions on unseen data with high precision.

2020

pdf bib
Towards Visual Dialog for Radiology
Olga Kovaleva | Chaitanya Shivade | Satyananda Kashyap | Karina Kanjaria | Joy Wu | Deddeh Ballah | Adam Coy | Alexandros Karargyris | Yufan Guo | David Beymer Beymer | Anna Rumshisky | Vandana Mukherjee Mukherjee
Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing

Current research in machine learning for radiology is focused mostly on images. There exists limited work in investigating intelligent interactive systems for radiology. To address this limitation, we introduce a realistic and information-rich task of Visual Dialog in radiology, specific to chest X-ray images. Using MIMIC-CXR, an openly available database of chest X-ray images, we construct both a synthetic and a real-world dataset and provide baseline scores achieved by state-of-the-art models. We show that incorporating medical history of the patient leads to better performance in answering questions as opposed to conventional visual question answering model which looks only at the image. While our experiments show promising results, they indicate that the task is extremely challenging with significant scope for improvement. We make both the datasets (synthetic and gold standard) and the associated code publicly available to the research community.

2015

pdf bib
Unsupervised Declarative Knowledge Induction for Constraint-Based Learning of Information Structure in Scientific Documents
Yufan Guo | Roi Reichart | Anna Korhonen
Transactions of the Association for Computational Linguistics, Volume 3

Inferring the information structure of scientific documents is useful for many NLP applications. Existing approaches to this task require substantial human effort. We propose a framework for constraint learning that reduces human involvement considerably. Our model uses topic models to identify latent topics and their key linguistic features in input documents, induces constraints from this information and maps sentences to their dominant information structure categories through a constrained unsupervised model. When the induced constraints are combined with a fully unsupervised model, the resulting model challenges existing lightly supervised feature-based models as well as unsupervised models that use manually constructed declarative knowledge. Our results demonstrate that useful declarative knowledge can be learned from data with very limited human involvement.

2014

pdf bib
Native Language Identification Using Large, Longitudinal Data
Xiao Jiang | Yufan Guo | Jeroen Geertzen | Dora Alexopoulou | Lin Sun | Anna Korhonen
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Native Language Identification (NLI) is a task aimed at determining the native language (L1) of learners of second language (L2) on the basis of their written texts. To date, research on NLI has focused on relatively small corpora. We apply NLI to the recently released EFCamDat corpus which is not only multiple times larger than previous L2 corpora but also provides longitudinal data at several proficiency levels. Our investigation using accurate machine learning with a wide range of linguistic features reveals interesting patterns in the longitudinal data which are useful for both further development of NLI and its application to research on L2 acquisition.

pdf bib
CRAB 2.0: A text mining tool for supporting literature review in chemical cancer risk assessment
Yufan Guo | Diarmuid Ó Séaghdha | Ilona Silins | Lin Sun | Johan Högberg | Ulla Stenius | Anna Korhonen
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: System Demonstrations

pdf bib
Social and Semantic Diversity: Socio-semantic Representation of a Scientific Corpus
Thierry Poibeau | Elisa Omodei | Jean-Philippe Cointet | Yufan Guo
Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH)

pdf bib
Argumentative analysis of the ACL Anthology (Analyse argumentative du corpus de l’ACL (ACL Anthology)) [in French]
Elisa Omodei | Yufan Guo | Jean-Philippe Cointet | Thierry Poibeau
Proceedings of TALN 2014 (Volume 2: Short Papers)

2013

pdf bib
Improved Information Structure Analysis of Scientific Documents Through Discourse and Lexical Constraints
Yufan Guo | Roi Reichart | Anna Korhonen
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2012

pdf bib
Using Argumentative Zones for Extractive Summarization of Scientific Articles
Danish Contractor | Yufan Guo | Anna Korhonen
Proceedings of COLING 2012

pdf bib
CRAB Reader: A Tool for Analysis and Visualization of Argumentative Zones in Scientific Literature
Yufan Guo | Ilona Silins | Roi Reichart | Anna Korhonen
Proceedings of COLING 2012: Demonstration Papers

2011

pdf bib
A Weakly-supervised Approach to Argumentative Zoning of Scientific Documents
Yufan Guo | Anna Korhonen | Thierry Poibeau
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

pdf bib
Identifying the Information Structure of Scientific Abstracts: An Investigation of Three Different Schemes
Yufan Guo | Anna Korhonen | Maria Liakata | Ilona Silins | Lin Sun | Ulla Stenius
Proceedings of the 2010 Workshop on Biomedical Natural Language Processing