2024
pdf
bib
abs
Part-of-Speech Tagging for Northern Kurdish
Peshmerge Morad
|
Sina Ahmadi
|
Lorenzo Gatti
Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024
In the growing domain of natural language processing, low-resourced languages like Northern Kurdish remain largely unexplored due to the lack of resources needed to be part of this growth. In particular, the tasks of part-of-speech tagging and tokenization for Northern Kurdish are still insufficiently addressed. In this study, we aim to bridge this gap by evaluating a range of statistical, neural, and fine-tuned-based models specifically tailored for Northern Kurdish. Leveraging limited but valuable datasets, including the Universal Dependency Kurmanji treebank and a novel manually annotated and tokenized gold-standard dataset consisting of 136 sentences (2,937 tokens). We evaluate several POS tagging models and report that the fine-tuned transformer-based model outperforms others, achieving an accuracy of 0.87 and a macro-averaged F1 score of 0.77. Data and models are publicly available under an open license at https://github.com/peshmerge/northern-kurdish-pos-tagging
2023
pdf
bib
abs
What does a Text Classifier Learn about Morality? An Explainable Method for Cross-Domain Comparison of Moral Rhetoric
Enrico Liscio
|
Oscar Araque
|
Lorenzo Gatti
|
Ionut Constantinescu
|
Catholijn Jonker
|
Kyriaki Kalimeri
|
Pradeep Kumar Murukannaiah
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Moral rhetoric influences our judgement. Although social scientists recognize moral expression as domain specific, there are no systematic methods for analyzing whether a text classifier learns the domain-specific expression of moral language or not. We propose Tomea, a method to compare a supervised classifier’s representation of moral rhetoric across domains. Tomea enables quantitative and qualitative comparisons of moral rhetoric via an interpretable exploration of similarities and differences across moral concepts and domains. We apply Tomea on moral narratives in thirty-five thousand tweets from seven domains. We extensively evaluate the method via a crowd study, a series of cross-domain moral classification comparisons, and a qualitative analysis of cross-domain moral expression.
2018
pdf
bib
An Information-Providing Closed-Domain Human-Agent Interaction Corpus
Jelte van Waterschoot
|
Guillaume Dubuisson Duplessis
|
Lorenzo Gatti
|
Merijn Bruijnes
|
Dirk Heylen
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
pdf
bib
abs
Template-based multilingual football reports generation using Wikidata as a knowledge base
Lorenzo Gatti
|
Chris van der Lee
|
Mariët Theune
Proceedings of the 11th International Conference on Natural Language Generation
This paper presents a new version of a football reports generation system called PASS. The original version generated Dutch text and relied on a limited hand-crafted knowledge base. We describe how, in a short amount of time, we extended PASS to produce English texts, exploiting machine translation and Wikidata as a large-scale source of multilingual knowledge.
2017
pdf
bib
abs
Fortia-FBK at SemEval-2017 Task 5: Bullish or Bearish? Inferring Sentiment towards Brands from Financial News Headlines
Youness Mansar
|
Lorenzo Gatti
|
Sira Ferradans
|
Marco Guerini
|
Jacopo Staiano
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)
In this paper, we describe a methodology to infer Bullish or Bearish sentiment towards companies/brands. More specifically, our approach leverages affective lexica and word embeddings in combination with convolutional neural networks to infer the sentiment of financial news headlines towards a target company. Such architecture was used and evaluated in the context of the SemEval 2017 challenge (task 5, subtask 2), in which it obtained the best performance.
pdf
bib
abs
To Sing like a Mockingbird
Lorenzo Gatti
|
Gözde Özbal
|
Oliviero Stock
|
Carlo Strapparava
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers
Musical parody, i.e. the act of changing the lyrics of an existing and very well-known song, is a commonly used technique for creating catchy advertising tunes and for mocking people or events. Here we describe a system for automatically producing a musical parody, starting from a corpus of songs. The system can automatically identify characterizing words and concepts related to a novel text, which are taken from the daily news. These concepts are then used as seeds to appropriately replace part of the original lyrics of a song, using metrical, rhyming and lexical constraints. Finally, the parody can be sung with a singing speech synthesizer, with no intervention from the user.
2016
pdf
bib
abs
Using WordNet to Build Lexical Sets for Italian Verbs
Anna Feltracco
|
Lorenzo Gatti
|
Elisabetta Jezek
|
Bernardo Magnini
|
Simone Magnolini
Proceedings of the 8th Global WordNet Conference (GWC)
We present a methodology for building lexical sets for argument slots of Italian verbs. We start from an inventory of semantically typed Italian verb frames and through a mapping to WordNet we automatically annotate the sets of fillers for the argument positions in a corpus of sentences. We evaluate both a baseline algorithm and a syntax driven algorithm and show that the latter performs significantly better in terms of precision.
2014
pdf
bib
abs
Creative language explorations through a high-expressivity N-grams query language
Carlo Strapparava
|
Lorenzo Gatti
|
Marco Guerini
|
Oliviero Stock
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
In computation linguistics a combination of syntagmatic and paradigmatic features is often exploited. While the first aspects are typically managed by information present in large n-gram databases, domain and ontological aspects are more properly modeled by lexical ontologies such as WordNet and semantic similarity spaces. This interconnection is even stricter when we are dealing with creative language phenomena, such as metaphors, prototypical properties, puns generation, hyperbolae and other rhetorical phenomena. This paper describes a way to focus on and accomplish some of these tasks by exploiting NgramQuery, a generalized query language on Google N-gram database. The expressiveness of this query language is boosted by plugging semantic similarity acquired both from corpora (e.g. LSA) and from WordNet, also integrating operators for phonetics and sentiment analysis. The paper reports a number of examples of usage in some creative language tasks.
2013
pdf
bib
Sentiment Analysis: How to Derive Prior Polarities from SentiWordNet
Marco Guerini
|
Lorenzo Gatti
|
Marco Turchi
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing
2012
pdf
bib
Assessing Sentiment Strength in Words Prior Polarities
Lorenzo Gatti
|
Marco Guerini
Proceedings of COLING 2012: Posters