Dominique Besagni


2020

pdf bib
An Experiment in Annotating Animal Species Names from ISTEX Resources
Sabine Barreaux | Dominique Besagni
Proceedings of the Twelfth Language Resources and Evaluation Conference

To exploit scientific publications from global research for TDM purposes, the ISTEX platform enriched its data with value-added information to ease access to its full-text documents. We built an experiment to explore new enrichment possibilities in documents focussing on scientific named entities recognition which could be integrated into ISTEX resources. This led to testing two detection tools for animal species names in a corpus of 100 documents in zoology. This makes it possible to provide the French scientific community with an annotated reference corpus available for use to measure these tools’ performance.

2010

pdf bib
FastKwic, an “Intelligent“ Concordancer Using FASTR
Véronika Lux-Pogodalla | Dominique Besagni | Karën Fort
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper, we introduce the FastKwic (Key Word In Context using FASTR), a new concordancer for French and English that does not require users to learn any particular request language. Built on FASTR, it shows them not only occurrences of the searched term but also of several morphological, morpho-syntactic and syntactic variants (for example, image enhancement, enhancement of image, enhancement of fingerprint image, image texture enhancement). Fastkwic is freely available. It consists of two UTF-8 compliant Perl modules that depend on several external tools and resources : FASTR, TreeTagger, Flemm (for French). Licenses of theses tools and resources permitting, the FastKwic package is nevertheless self-sufficient. FastKwic first modules is for terminological resource compilation. Its input is a list of terms - as required by FASTR. FastKwic second module is for processing concordances. It relies on FASTR again for indexing the input corpus with terms and their variants. Its output is a concordancer: for each term and its variants, the context of occurrence is provided.