Lars Ahrenberg


2024

pdf bib
Fitting Fixed Expressions into the UD Mould: Swedish as a Use Case
Lars Ahrenberg
Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024

Fixed multiword expressions are common in many, if not all, natural languages. In the Universal Dependencies framework, UD, a subset of these expressions are modelled with the dependency relation ‘fixed’ targeting the most grammaticalized cases of functional multiword items. In this paper we perform a detailed analysis of 439 expressions modelled with ‘fixed’ in two Swedish UD treebanks in order to reduce their numbers and fit the definition better. We identify a large number of dimensions of variation for fixed multiword expressions that can be used for the purpose. We also point out several problematic aspects of the current UD approach to multiword expressions and discuss different alternative solutions for modelling fixed expresions. We suggest that insights from Constructional Grammar (CxG) can help with a more systematic treatment of fixed expressions in UD.

2023

pdf bib
Who said what? Speaker Identification from Anonymous Minutes of Meetings
Daniel Holmer | Lars Ahrenberg | Julius Monsen | Arne Jönsson | Mikael Apel | Marianna Grimaldi
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)

We study the performance of machine learning techniques to the problem of identifying speakers at meetings from anonymous minutes issued afterwards. The data comes from board meetings of Sveriges Riksbank (Sweden’s Central Bank). The data is split in two ways, one where each reported contribution to the discussion is treated as a data point, and another where all contributions from a single speaker have been aggregated. Using interpretable models we find that lexical features and topic models generated from speeches held by the board members outside of board meetings are good predictors of speaker identity. Combining topic models with other features gives prediction accuracies close to 80% on aggregated data, though there is still a sizeable gap in performance compared to a not easily interpreted BERT-based transformer model that we offer as a benchmark.

2021

pdf bib
Translation Competence in Machines: A Study of Adjectives in English-Swedish Translation
Lars Ahrenberg
Proceedings for the First Workshop on Modelling Translation: Translatology in the Digital Age

2019

pdf bib
Proceedings of the Workshop on NLP and Pseudonymisation
Lars Ahrenberg | Beata Megyesi
Proceedings of the Workshop on NLP and Pseudonymisation

pdf bib
Towards an adequate account of parataxis in Universal Dependencies
Lars Ahrenberg
Proceedings of the Third Workshop on Universal Dependencies (UDW, SyntaxFest 2019)

2017

pdf bib
Swedish Prepositions are not Pure Function Words
Lars Ahrenberg
Proceedings of the NoDaLiDa 2017 Workshop on Universal Dependencies (UDW 2017)

pdf bib
Comparing Machine Translation and Human Translation: A Case Study
Lars Ahrenberg
Proceedings of the Workshop Human-Informed Translation and Interpreting Technology

As machine translation technology improves comparisons to human performance are often made in quite general and exaggerated terms. Thus, it is important to be able to account for differences accurately. This paper reports a simple, descriptive scheme for comparing translations and applies it to two translations of a British opinion article published in March, 2017. One is a human translation (HT) into Swedish, and the other a machine translation (MT). While the comparison is limited to one text, the results are indicative of current limitations in MT.

2015

pdf bib
Converting an English-Swedish Parallel Treebank to Universal Dependencies
Lars Ahrenberg
Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015)

2014

pdf bib
Translation Class Instruction as Collaboration in the Act of Translation
Lars Ahrenberg | Ljuba Tarvi
Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications

2013

pdf bib
Generation of Compound Words in Statistical Machine Translation into Compounding Languages
Sara Stymne | Nicola Cancedda | Lars Ahrenberg
Computational Linguistics, Volume 39, Issue 4 - December 2013

pdf bib
IPhraxtor: A Linguistically Informed System for Extraction of Term Candidates
Magnus Merkel | Jody Foo | Lars Ahrenberg
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013)

2012

pdf bib
On the practice of error analysis for machine translation evaluation
Sara Stymne | Lars Ahrenberg
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Error analysis is a means to assess machine translation output in qualitative terms, which can be used as a basis for the generation of error profiles for different systems. As for other subjective approaches to evaluation it runs the risk of low inter-annotator agreement, but very often in papers applying error analysis to MT, this aspect is not even discussed. In this paper, we report results from a comparative evaluation of two systems where agreement initially was low, and discuss the different ways we used to improve it. We compared the effects of using more or less fine-grained taxonomies, and the possibility to restrict analysis to short sentences only. We report results on inter-annotator agreement before and after measures were taken, on error categories that are most likely to be confused, and on the possibility to establish error profiles also in the absence of a high inter-annotator agreement.

pdf bib
Error profiling for evaluation of machine-translated text: a Polish-English case study
Sandra Weiss | Lars Ahrenberg
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We present a study of Polish-English machine translation, where the impact of various types of errors on cohesion and comprehensibility of the translations were investigated. The following phenomena are in focus: (i) The most common errors produced by current state-of-the-art MT systems for Polish-English MT. (ii) The effect of different types of errors on text cohesion. (iii) The effect of different types of errors on readers' understanding of the translation. We found that errors of incorrect and missing translations are the most common for current systems, while the category of non-translated words had the most negative impact on comprehension. All three of these categories contributed to the breaking of cohesive chains. The correlation between number of errors found in a translation and number of wrong answers in the comprehension tests was low. Another result was that non-native speakers of English performed at least as good as native speakers on the comprehension tests.

pdf bib
Alignment-based reordering for SMT
Maria Holmqvist | Sara Stymne | Lars Ahrenberg | Magnus Merkel
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We present a method for improving word alignment quality for phrase-based statistical machine translation by reordering the source text according to the target word order suggested by an initial word alignment. The reordered text is used to create a second word alignment which can be an improvement of the first alignment, since the word order is more similar. The method requires no other pre-processing such as part-of-speech tagging or parsing. We report improved Bleu scores for English-to-German and English-to-Swedish translation. We also examined the effect on word alignment quality and found that the reordering method increased recall while lowering precision, which partly can explain the improved Bleu scores. A manual evaluation of the translation output was also performed to understand what effect our reordering method has on the translation system. We found that where the system employing reordering differed from the baseline in terms of having more words, or a different word order, this generally led to an improvement in translation quality.

2011

pdf bib
Experiments with word alignment, normalization and clause reordering for SMT between English and German
Maria Holmqvist | Sara Stymne | Lars Ahrenberg
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
A Gold Standard for English-Swedish Word Alignment
Maria Holmqvist | Lars Ahrenberg
Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011)

2010

pdf bib
Vs and OOVs: Two Problems for Translation between German and English
Sara Stymne | Maria Holmqvist | Lars Ahrenberg
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
Computing Word Senses by Semantic Mirroring and Spectral Graph Partitioning
Martin Fagerlund | Magnus Merkel | Lars Eldén | Lars Ahrenberg
Proceedings of TextGraphs-5 - 2010 Workshop on Graph-based Methods for Natural Language Processing

pdf bib
Alignment-based Profiling of Europarl Data in an English-Swedish Parallel Corpus
Lars Ahrenberg
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper profiles the Europarl part of an English-Swedish parallel corpus and compares it with three other subcorpora of the same parallel corpus. We first describe our method for comparison which is based on manually reviewed word alignments. We investigate relative frequences of different types of correspondence, including null alignments, many-to-one correspondences and crossings. In addition, both halves of the parallel corpus have been annotated with morpho-syntactic information. The syntactic annotation uses labelled dependency relations. Thus, we can see how different types of correspondences are distributed on different parts-of-speech and compute correspondences at the structural level. In spite of the fact that two of the other subcorpora contains fiction, it is found that the Europarl part is the one having the highest proportion of many types of restructurings, including additions, deletions, long distance reorderings and dependency reversals. We explain this by the fact that the majority of Europarl segments are parallel translations rather than source texts and their translations.

pdf bib
Using a Grammar Checker for Evaluation and Postprocessing of Statistical Machine Translation
Sara Stymne | Lars Ahrenberg
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

One problem in statistical machine translation (SMT) is that the output often is ungrammatical. To address this issue, we have investigated the use of a grammar checker for two purposes in connection with SMT: as an evaluation tool and as a postprocessing tool. To assess the feasibility of the grammar checker on SMT output, we performed an error analysis, which showed that the precision of error identification in general was higher on SMT output than in previous studies on human texts. Using the grammar checker as an evaluation tool gives a complementary picture to standard metrics such as Bleu, which do not account well for grammaticality. We use the grammar checker as a postprocessing tool by automatically applying the error correction suggestions it gives. There are only small overall improvements of the postprocessing on automatic metrics, but the sentences that are affected by the changes are improved, as shown both by automatic metrics and by a human error analysis. These results indicate that grammar checker techniques are a useful complement to SMT.

2009

pdf bib
Improving Alignment for SMT by Reordering and Augmenting the Training Corpus
Maria Holmqvist | Sara Stymne | Jody Foo | Lars Ahrenberg
Proceedings of the Fourth Workshop on Statistical Machine Translation

2008

pdf bib
Effects of Morphological Analysis in Translation between German and English
Sara Stymne | Maria Holmqvist | Lars Ahrenberg
Proceedings of the Third Workshop on Statistical Machine Translation

pdf bib
Converting Romanized Persian to the Arabic Writing Systems
Jalal Maleki | Lars Ahrenberg
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper describes a syllabification based conversion method for converting romanized Persian text to the traditional Arabic-based writing system. The system is implemented in Xerox XFST and relies on rule based conversion of words rather than using morphological analysis. The paper presents a brief evaluation of the accuracy of the transcriptions generated by the method.

2007

pdf bib
Getting to Know Moses: Initial Experiments on German-English Factored Translation
Maria Holmqvist | Sara Stymne | Lars Ahrenberg
Proceedings of the Second Workshop on Statistical Machine Translation

pdf bib
LinES: An English-Swedish Parallel Treebank
Lars Ahrenberg
Proceedings of the 16th Nordic Conference of Computational Linguistics (NODALIDA 2007)

2006

pdf bib
KUNSTI - Knowledge Generation for Norwegian Language Technology
Bente Maegaard | Jens-Erik Fenstad | Lars Ahrenberg | Knut Kvale | Katarina Mühlenbock | Bernt-Erik Heid
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

KUNSTI is the Norwegian national language technology programme, running 2001-2006 inclusive. The goal of the programme is to boost Norwegian language technology research. In this paper we describe the background, the objectives, the methodology applied in the management of the programme, the projects selected, and our first conclusions. We also describe national programmes form Sweden, France and Germany and compare objectives and methods.

pdf bib
A Bilingual Grammar for Translation of English-Swedish Verb Frame Divergences
Sara Stymne | Lars Ahrenberg
Proceedings of the 11th Annual Conference of the European Association for Machine Translation

2005

pdf bib
Codified close translation as a standard for MT
Lars Ahrenberg
Proceedings of the 10th EAMT Conference: Practical applications of machine translation

2003

pdf bib
Interactive Word Alignment for Language Engineering
Lars Ahrenberg | Magnus Merkel | Michael Petterstedt
10th Conference of the European Chapter of the Association for Computational Linguistics

2002

pdf bib
A System for Incremental and Interactive Word Linking
Lars Ahrenberg | Mikael Andersson | Magnus Merkel
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2001

pdf bib
From Word Alignment to Machine Translation via Superlinks
Lars Ahrenberg | Håkan Jonsson
Proceedings of the 13th Nordic Conference of Computational Linguistics (NODALIDA 2001)

2000

pdf bib
Evaluation of Word Alignment Systems
Lars Ahrenberg | Magnus Merkel | Anna Sågvall Hein | Jörg Tiedemann
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

1998

pdf bib
A Simple Hybrid Aligner for Generating Lexical Correspondences in Parallel Texts
Lars Ahrenberg | Mikael Andersson | Magnus Merkel
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1

pdf bib
A Simple Hybrid Aligner for Generating Lexical Correspondences in Parallel Texts
Lars Ahrenberg | Mikael Andersson | Magnus Merkel
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics

1994

pdf bib
Topological frames in sign-based grammars
Lars Ahrenberg
Proceedings of the 9th Nordic Conference of Computational Linguistics (NODALIDA 1993)

pdf bib
A Phrase-Retrieval System based on Recurrence
Magnus Merkel | Bertn Nilsson | Lars Ahrenberg
Second Workshop on Very Large Corpora

The paper describes a simple but useful phrase-retrieval system that primarily is intended as a support tool for computer-aided translation. Given no other input than a text (and a word list used for filtering purposes), the system retrieves recurrent sentences and phrases of the text and their positions. In addition the system provides information on internal and external recurrence rates.

1992

pdf bib
Conceptual text representation for multi-lingual generation and translation
Lars Ahrenberg | Stefan Svenberg
Proceedings of the 8th Nordic Conference of Computational Linguistics (NODALIDA 1991)

1990

pdf bib
A Grammar Combining Phrase Structure and Field Structure
Lars Ahrenberg
COLING 1990 Volume 2: Papers presented to the 13th International Conference on Computational Linguistics

1988

pdf bib
A system for object-oriented dialogue in Swedish
Lars Ahrenberg
Proceedings of the 6th Nordic Conference of Computational Linguistics (NODALIDA 1987)

pdf bib
Functional Constraints in Knowledge-Based Natural Language Understanding
Lars Ahrenberg
Coling Budapest 1988 Volume 1: International Conference on Computational Linguistics

1987

pdf bib
Parsing into Discourse Object Descriptions
Lars Ahrenberg
Third Conference of the European Chapter of the Association for Computational Linguistics

1986

pdf bib
Lexikalisk-funktionell grammatik på svenska (Lexical Functional Grammar in Swedish) [In Swedish]
Lars Ahrenberg
Proceedings of the 5th Nordic Conference of Computational Linguistics (NODALIDA 1985)

1984

pdf bib
De grammatiska beskrivningarna i SVE.UCP (The grammatical descriptions in SVE.UCP) [In Swedish]
Lars Ahrenberg
Proceedings of the 4th Nordic Conference of Computational Linguistics (NODALIDA 1983)