A methodology for the extraction of information about the usage of formulaic expressions in scientific texts

Hannah Kermes


Abstract
In this paper, we present a methodology for the extraction of formulaic expressions, which goes beyond the mere extraction of candidate patterns. Using a pipeline we are able to extract information about the usage of formulaic expressions automatically from text corpora. According to Biber and Barbieri (2007) formulaic expressions are “important building blocks of discourse in spoken and written registers”. The automatic extraction procedure can help to investigate the usage and function of these recurrent patterns in different registers and domains. Formulaic expressions are commonplace not only in every- day language but also in scientific writing. Patterns such as 'in this paper', 'the number of', 'on the basis of' are often used by scientists to convey research interests, the theoretical basis of their studies, results of experiments, sci- entific findings as well as conclusions and are used as dis- course organizers. For Hyland (2008) they help to “shape meanings in specific context and contribute to our sense of coherence in a text”. We are interested in: (i) which and what type of formulaic expressions are used in scientific texts? (ii) the distribution of formulaic expression across different scien- tific disciplines, (iii) where do formulaic expressions occur within a text?
Anthology ID:
L12-1534
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2064–2068
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/895_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Hannah Kermes. 2012. A methodology for the extraction of information about the usage of formulaic expressions in scientific texts. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 2064–2068, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
A methodology for the extraction of information about the usage of formulaic expressions in scientific texts (Kermes, LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/895_Paper.pdf