Proceedings of the NoDaLiDa 2023 Workshop on Constraint Grammar - Methods, Tools and Applications

Eckhard Bick, Trond Trosterud, Tanel Alumäe (Editors)

Anthology ID:: 2023.nodalida-cgmta
Month:: May
Year:: 2023
Address:: Tórshavn, Faroe Islands
Venues:: cgmta | WS
Events:: Workshop on Constraint Grammar and Finite State NLP (2023) | The 24th Nordic Conference on Computational Linguistics (NoDaLiDa) | Other Workshops and Events (2023)
SIG:
Publisher:: Association of Computational Linguistics
URL:: https://aclanthology.org/2023.nodalida-cgmta/
DOI:
Bib Export formats:: BibTeX MODS XML EndNote
PDF:: https://aclanthology.org/2023.nodalida-cgmta.pdf

Proceedings of the NoDaLiDa 2023 Workshop on Constraint Grammar - Methods, Tools and Applications
Eckhard Bick | Trond Trosterud | Tanel Alumäe

pdf bib abs

Attribution of Quoted Speech in Portuguese Text
Eckhard Bick

This paper describes and evaluates a rule-based system implementing a novel method for quote attribution in Portuguese text, working on top of a Constraint-Grammar parse. Both direct and indirect speech are covered, as well as certain other text- embedded quote sources. In a first step, the system performs quote segmentation and identifies speech verbs, taking into account the different styles used in literature and news text. Speakers are then identified using syntactically and semantically grounded Constraint-Grammar rules. We rely on relational links and stream variables to handle anaphorical mentions and to recover the names of implied or underspecified speakers. In an evaluation including both literature and news text, the system performed well on both the segmentation and attribution tasks, achieving F-scores of 98-99% for the former and 89-94% for the latter.

pdf bib abs

WITH Context: Adding Rule-Grouping to VISL CG-3
Daniel Swanson | Tino Didriksen | Francis M. Tyers

This paper presents an extension to the VISL CG-3 compiler and processor which enables complex contexts to be shared between rules. This sharing substantially improves the readability and maintainability of sets of rules performing multi-step operations.

pdf bib abs

To ð or not to ð - A Faroese CG-based grammar checker targeting ð errors
Trond Trosterud

Many errors in Faroese writing are linked to the letter ð, a letter which has no corresponding phoneme, and is always omitted intervocally and wordfinally after a vowel. It plays an important role in the written language, disambiguating homophone but not homograph forms like infinitive kasta ‘throw’ from its participle kastað. Since adding a hypercorrect ð or erroneously omitting it often results in an existing word, these errors cannot be captured by ordinary spellcheckers. The article presents a grammar checker targeting ð errors, and discusses challenges related to false alarms.

pdf bib abs

Towards automatic essay scoring of Basque language texts from a rule-based approach based on curriculum-aware systems
Jose Maria Arriola | Mikel Iruskieta | Ekain Arrieta | Jon Alkorta

Although the Basque Education Law mentions that students must finish secondary compulsory education at B2 Basque level and their undergraduate studies at the C1 level, there are no objective tests or tools that can discriminate between these levels. This work presents the first rule-based method to grade written Basque learner texts. We adapt the adult Basque learner curriculum based on the CEFR to create a rule-based grammar for Basque. This paper summarises the results obtained in different classification tasks by combining information formalised through CG3 and different machine learning algorithms used in text classification. Besides, we perform a manual evaluation of the grammar. Finally, we discuss the informa- tiveness of these rules and some ways to further improve assisted text grading and combine rule-based approaches with other approaches based on readability and complexity measures.

pdf bib abs

Correcting well-known interference errors – Towards a L2 grammar checker for Inari Saami
Trond Trosterud | Marja-Liisa Olthuis | Linda Wiechetek

We present GramDivvun, the first Inari Saami grammar checker for L2 users. The grammar checker is an important tool in the revitalisation of the language, in particular for strengthening the literary language. As the Inari Saami language community needs language tools predominantly for language learners, the focus is on grammatical interference errors made by (mostly Finnish-speaking) learners. Six of these errors are featured in the first version of the grammar checker. For non-proofread text written by inexperienced writers, precision is good, 73%. With experienced text and proofread text, alarms are rare but precision considerably lower, 19.5 % on average, but varying considerably between the error types. The paper discusses reasons for this variation. Future plans are improving results by means of increased testing, especially for complex sentences, and eventually also including more error types.

pdf bib abs

Supporting Language Users - Releasing a Full-fledged Lule Sámi Grammar Checker
Inga Lill Sigga Mikkelsen | Linda Wiechetek

We present the first rule-based L1 grammar checker for Lule Sámi. Releasing a Lule Sámi grammar checker has direct consequences for language revitalization. Our primary intention is therefore to support language users in their writing and their confidence to use the language. We release a version of the tool for MS Word and GoogleDocs that corrects six grammatical error types. For the benefit of the user, the selection of error types is based on frequency of the errors and the quality of our tool. Our most successful error correction, for a phonetically and syntactically motivated copula error, reaches a precision of 96%.

pdf bib abs

A South Sámi Grammar Checker For Stopping Language Change
Linda Wiechetek | Maja Lisa Kappfjell

We have released and evaluated the first South Sámi grammar checker GramDivvun. It corrects two frequent error types that are caused by and causing language change and a loss of the language’s morphological richness. These general error types comprise a number of errors regarding the adjective paradigm (confusion of attributive and predicative forms) and the negation paradigm. In addition, our work includes a classification of common error types regarding the adjective and negation paradigms and lead to extensive grammatical error mark-up of our gold corpus. We achieve precisions above 71% for both adjective and negation error correction.