Characterization of Scientific and Popular Science Discourse in French, Japanese and Russian

Lorraine Goeuriot, Natalia Grabar, Béatrice Daille


Abstract
We aim to characterize the comparability of corpora, we address this issue in the trilingual context through the distinction of expert and non expert documents. We work separately with corpora composed of documents from the medical domain in three languages (French, Japanese and Russian) which present an important linguistic distance between them. In our approach, documents are characterized in each language by their topic and by a discursive typology positioned at three levels of document analysis: structural, modal and lexical. The document typology is implemented with two learning algorithms (SVMlight and C4.5). Evaluation of results shows that the proposed discursive typology can be transposed from one language to another, as it indeed allows to distinguish the two aimed discourses (science and popular science). However, we observe that performances vary a lot according to languages, algorithms and types of discursive characteristics.
Anthology ID:
L08-1314
Volume:
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Month:
May
Year:
2008
Address:
Marrakech, Morocco
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/743_paper.pdf
DOI:
Bibkey:
Cite (ACL):
Lorraine Goeuriot, Natalia Grabar, and Béatrice Daille. 2008. Characterization of Scientific and Popular Science Discourse in French, Japanese and Russian. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Resources Association (ELRA).
Cite (Informal):
Characterization of Scientific and Popular Science Discourse in French, Japanese and Russian (Goeuriot et al., LREC 2008)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/743_paper.pdf