Nico Colic
2024
Pre-Gamus: Reducing Complexity of Scientific Literature as a Support against Misinformation
Nico Colic
|
Jin-Dong Kim
|
Fabio Rinaldi
Proceedings of the Workshop on DeTermIt! Evaluating Text Difficulty in a Multilingual Context @ LREC-COLING 2024
Scientific literature encodes a wealth of knowledge relevant to various users. However, the complexity of scientific jargon makes it inaccessible to all but domain specialists. It would be helpful for different types of people to be able to get at least a gist of a paper. Biomedical practitioners often find it difficult to keep up with the information load; but even lay people would benefit from scientific information, for example to dispel medical misconceptions. Besides, in many countries, familiarity with English is limited, let alone scientific English, even among professionals. All this points to the need for simplified access to the scientific literature. We thus present an application aimed at solving this problem, which is capable of summarising scientific text in a way that is tailored to specific types of users, and in their native language. For this objective, we used an LLM that our system queries using user-selected parameters. We conducted an informal evaluation of this prototype using a questionnaire in 3 different languages.
Reducing complexity of Scientific Literature by automated simplification and translation
Nico Colic
|
Fabio Rinaldi
Proceedings of the 9th edition of the Swiss Text Analytics Conference
2020
Annotating the Pandemic: Named Entity Recognition and Normalisation in COVID-19 Literature
Nico Colic
|
Lenz Furrer
|
Fabio Rinaldi
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020
The COVID-19 pandemic has been accompanied by such an explosive increase in media coverage and scientific publications that researchers find it difficult to keep up. We are presenting a publicly available pipeline to perform named entity recognition and normalisation in parallel to help find relevant publications and to aid in downstream NLP tasks such as text summarisation. In our approach, we are using a dictionary-based system for its high recall in conjunction with two models based on BioBERT for their accuracy. Their outputs are combined according to different strategies depending on the entity type. In addition, we are using a manually crafted dictionary to increase performance for new concepts related to COVID-19. We have previously evaluated our work on the CRAFT corpus, and make the output of our pipeline available on two visualisation platforms.