Goran Topić

Also published as: Goran Topic


pdf bib
BiomedCurator: Data Curation for Biomedical Literature
Mohammad Golam Sohrab | Khoa N.A. Duong | Ikeda Masami | Goran Topić | Yayoi Natsume-Kitatani | Masakata Kuroda | Mari Nogami Itoh | Hiroya Takamura
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: System Demonstrations

We present BiomedCurator1, a web application that extracts the structured data from scientific articles in PubMed and ClinicalTrials.gov. BiomedCurator uses state-of-the-art natural language processing techniques to fill the fields pre-selected by domain experts in the relevant biomedical area. The BiomedCurator web application includes: text generation based model for relation extraction, entity detection and recognition, text classification model for extracting several fields, information retrieval from external knowledge base to retrieve IDs, and a pattern-based extraction approach that can extract several fields using regular expressions over the PubMed and ClinicalTrials.gov datasets. Evaluation results show that different approaches of BiomedCurator web application system are effective for automatic data curation in the biomedical domain.


pdf bib
Generating Racing Game Commentary from Vision, Language, and Structured Data
Tatsuya Ishigaki | Goran Topic | Yumi Hamazono | Hiroshi Noji | Ichiro Kobayashi | Yusuke Miyao | Hiroya Takamura
Proceedings of the 14th International Conference on Natural Language Generation

We propose the task of automatically generating commentaries for races in a motor racing game, from vision, structured numerical, and textual data. Commentaries provide information to support spectators in understanding events in races. Commentary generation models need to interpret the race situation and generate the correct content at the right moment. We divide the task into two subtasks: utterance timing identification and utterance generation. Because existing datasets do not have such alignments of data in multiple modalities, this setting has not been explored in depth. In this study, we introduce a new large-scale dataset that contains aligned video data, structured numerical data, and transcribed commentaries that consist of 129,226 utterances in 1,389 races in a game. Our analysis reveals that the characteristics of commentaries change over time or from viewpoints. Our experiments on the subtasks show that it is still challenging for a state-of-the-art vision encoder to capture useful information from videos to generate accurate commentaries. We make the dataset and baseline implementation publicly available for further research.


pdf bib
An empirical analysis of existing systems and datasets toward general simple question answering
Namgi Han | Goran Topic | Hiroshi Noji | Hiroya Takamura | Yusuke Miyao
Proceedings of the 28th International Conference on Computational Linguistics

In this paper, we evaluate the progress of our field toward solving simple factoid questions over a knowledge base, a practically important problem in natural language interface to database. As in other natural language understanding tasks, a common practice for this task is to train and evaluate a model on a single dataset, and recent studies suggest that SimpleQuestions, the most popular and largest dataset, is nearly solved under this setting. However, this common setting does not evaluate the robustness of the systems outside of the distribution of the used training data. We rigorously evaluate such robustness of existing systems using different datasets. Our analysis, including shifting of training and test datasets and training on a union of the datasets, suggests that our progress in solving SimpleQuestions dataset does not indicate the success of more general simple question answering. We discuss a possible future direction toward this goal.

pdf bib
BENNERD: A Neural Named Entity Linking System for COVID-19
Mohammad Golam Sohrab | Khoa Duong | Makoto Miwa | Goran Topić | Ikeda Masami | Takamura Hiroya
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

We present a biomedical entity linking (EL) system BENNERD that detects named enti- ties in text and links them to the unified medical language system (UMLS) knowledge base (KB) entries to facilitate the corona virus disease 2019 (COVID-19) research. BEN- NERD mainly covers biomedical domain, es- pecially new entity types (e.g., coronavirus, vi- ral proteins, immune responses) by address- ing CORD-NER dataset. It includes several NLP tools to process biomedical texts includ- ing tokenization, flat and nested entity recog- nition, and candidate generation and rank- ing for EL that have been pre-trained using the CORD-NER corpus. To the best of our knowledge, this is the first attempt that ad- dresses NER and EL on COVID-19-related entities, such as COVID-19 virus, potential vaccines, and spreading mechanism, that may benefit research on COVID-19. We release an online system to enable real-time entity annotation with linking for end users. We also release the manually annotated test set and CORD-NERD dataset for leveraging EL task. The BENNERD system is available at https://aistairc.github.io/BENNERD/.


pdf bib
CroVeWA: Crosslingual Vector-Based Writing Assistance
Hubert Soyer | Goran Topić | Pontus Stenetorp | Akiko Aizawa
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations


pdf bib
Significance of Bridging Real-world Documents and NLP Technologies
Tadayoshi Hara | Goran Topić | Yusuke Miyao | Akiko Aizawa
Proceedings of the Workshop on Open Infrastructures and Analysis Frameworks for HLT


pdf bib
Sense Disambiguation: From Natural Language Words to Mathematical Terms
Minh-Quoc Nghiem | Giovanni Yoko Kristianto | Goran Topić | Akiko Aizawa
Proceedings of the Sixth International Joint Conference on Natural Language Processing


pdf bib
brat: a Web-based Tool for NLP-Assisted Text Annotation
Pontus Stenetorp | Sampo Pyysalo | Goran Topić | Tomoko Ohta | Sophia Ananiadou | Jun’ichi Tsujii
Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics


pdf bib
BioNLP Shared Task 2011: Supporting Resources
Pontus Stenetorp | Goran Topić | Sampo Pyysalo | Tomoko Ohta | Jin-Dong Kim | Jun’ichi Tsujii
Proceedings of BioNLP Shared Task 2011 Workshop