Communicative-Function-Based Sentence Classification for Construction of an Academic Formulaic Expression Database

Kenichi Iwatsuki, Akiko Aizawa


Abstract
Formulaic expressions (FEs), such as ‘in this paper, we propose’ are frequently used in scientific papers. FEs convey a communicative function (CF), i.e. ‘showing the aim of the paper’ in the above-mentioned example. Although CF-labelled FEs are helpful in assisting academic writing, the construction of FE databases requires manual labour for assigning CF labels. In this study, we considered a fully automated construction of a CF-labelled FE database using the top–down approach, in which the CF labels are first assigned to sentences, and then the FEs are extracted. For the CF-label assignment, we created a CF-labelled sentence dataset, on which we trained a SciBERT classifier. We show that the classifier and dataset can be used to construct FE databases of disciplines that are different from the training data. The accuracy of in-disciplinary classification was more than 80%, while cross-disciplinary classification also worked well. We also propose an FE extraction method, which was applied to the CF-labelled sentences. Finally, we constructed and published a new, large CF-labelled FE database. The evaluation of the final CF-labelled FE database showed that approximately 65% of the FEs are correct and useful, which is sufficiently high considering practical use.
Anthology ID:
2021.eacl-main.304
Volume:
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume
Month:
April
Year:
2021
Address:
Online
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3476–3497
Language:
URL:
https://aclanthology.org/2021.eacl-main.304
DOI:
10.18653/v1/2021.eacl-main.304
Bibkey:
Cite (ACL):
Kenichi Iwatsuki and Akiko Aizawa. 2021. Communicative-Function-Based Sentence Classification for Construction of an Academic Formulaic Expression Database. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 3476–3497, Online. Association for Computational Linguistics.
Cite (Informal):
Communicative-Function-Based Sentence Classification for Construction of an Academic Formulaic Expression Database (Iwatsuki & Aizawa, EACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.eacl-main.304.pdf
Data
SciERC