2024
pdf
bib
abs
LLODIA: A Linguistic Linked Open Data Model for Diachronic Analysis
Florentina Armaselu
|
Chaya Liebeskind
|
Paola Marongiu
|
Barbara McGillivray
|
Giedre Valunaite Oleskeviciene
|
Elena-Simona Apostol
|
Ciprian-Octavian Truica
|
Daniela Gifu
Proceedings of the 9th Workshop on Linked Data in Linguistics @ LREC-COLING 2024
This article proposes a linguistic linked open data model for diachronic analysis (LLODIA) that combines data derived from diachronic analysis of multilingual corpora with dictionary-based evidence. A humanities use case was devised as a proof of concept that includes examples in five languages (French, Hebrew, Latin, Lithuanian and Romanian) related to various meanings of the term “revolution” considered at different time intervals. The examples were compiled through diachronic word embedding and dictionary alignment.
pdf
bib
abs
LinguisTech at SemEval-2024 Task 10: Emotion Discovery and Reasoning its Flip in Conversation
Mihaela Alexandru
|
Călina Ciocoiu
|
Ioana Măniga
|
Octavian Ungureanu
|
Daniela Gîfu
|
Diana Trandăbăț
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
The “Emotion Discovery and Reasoning Its Flip in Conversation” task at the SemEval 2024 competition focuses on the automatic recognition of emotion flips, triggered within multi-party textual conversations. This paper proposes a novel approach that draws a parallel between a mixed strategy and a comparative strategy, contrasting a Rule-Based Function with Named Entity Recognition (NER)—an approach that shows promise in understanding speaker-specific emotional dynamics. Furthermore, this method surpasses the performance of both DistilBERT and RoBERTa models, demonstrating competitive effectiveness in detecting emotion flips triggered in multi-party textual conversations, achieving a 70% F1-score. This system was ranked 6th in the SemEval 2024 competition for Subtask 3.
2023
pdf
bib
abs
FII SMART at SemEval 2023 Task7: Multi-evidence Natural Language Inference for Clinical Trial Data
Mihai Volosincu
|
Cosmin Lupu
|
Diana Trandabat
|
Daniela Gifu
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
The “Multi-evidence Natural Language Inference forClinical Trial Data” task at SemEval 2023competition focuses on extracting essentialinformation on clinical trial data, by posing twosubtasks on textual entailment and evidence retrieval. In the context of SemEval, we present a comparisonbetween a method based on the BioBERT model anda CNN model. The task is based on a collection ofbreast cancer Clinical Trial Reports (CTRs),statements, explanations, and labels annotated bydomain expert annotators. We achieved F1 scores of0.69 for determining the inference relation(entailment vs contradiction) between CTR -statement pairs. The implementation of our system ismade available via Github -
https://github.com/volosincu/FII_Smart__Semeval2023.
pdf
bib
abs
Togedemaru at SemEval-2023 Task 8: Causal Medical Claim Identification and Extraction from Social Media Posts
Andra Oica
|
Daniela Gifu
|
Diana Trandabat
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
The “Causal Medical Claim Identification and Extraction from Social Media Posts task at SemEval 2023 competition focuses on identifying and validating medical claims in English, by posing two subtasks on causal claim identification and PIO (Population, Intervention, Outcome) frame extraction. In the context of SemEval, we present a method for sentence classification in four categories (claim, experience, experience_based_claim or a question) based on BioBERT model with a MLP layer. The website from which the dataset was gathered, Reddit, is a social news and content discussion site. The evaluation results show the effectiveness of the solution of this study (83.68%).
pdf
bib
abs
FII_Better at SemEval-2023 Task 2: MultiCoNER II Multilingual Complex Named Entity Recognition
Viorica-Camelia Lupancu
|
Alexandru-Gabriel Platica
|
Cristian-Mihai Rosu
|
Daniela Gifu
|
Diana Trandabat
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
This task focuses on identifying complex named entities (NEs) in several languages. In the context of SemEval-2023 competition, our team presents an exploration of a base transformer model’s capabilities regarding the task, focused more specifically on five languages (English, Spanish, Swedish, German, Italian). We take DistilBERT and BERT as two examples of basic transformer models, using DistilBERT as a baseline and BERT as the platform to create an improved model. The dataset that we are using, MultiCoNER II, is a large multilingual dataset used for NER, that covers domains like: Wiki sentences, questions and search queries across 12 languages. This dataset contains 26M tokens and it is assembled from public resources. MultiCoNER II defines a NER tag-set with 6 classes and 67 tags. We have managed to get moderate results in the English track (we ranked 17th out of 34), while our results in the other tracks could be further improved in the future (overall third to last).
pdf
bib
Workflow Reversal and Data Wrangling in Multilingual Diachronic Analysis and Linguistic Linked Open Data Modelling
Florentina Armaselu
|
Barbara McGillivray
|
Chaya Liebeskind
|
Giedrė Valūnaitė Oleškevičienė
|
Andrius Utka
|
Daniela Gifu
|
Anas Fahad Khan
|
Elena-Simona Apostol
|
Ciprian-Octavian Truica
Proceedings of the 4th Conference on Language, Data and Knowledge
2022
pdf
bib
abs
A Survey of Guidelines and Best Practices for the Generation, Interlinking, Publication, and Validation of Linguistic Linked Data
Fahad Khan
|
Christian Chiarcos
|
Thierry Declerck
|
Maria Pia Di Buono
|
Milan Dojchinovski
|
Jorge Gracia
|
Giedre Valunaite Oleskeviciene
|
Daniela Gifu
Proceedings of the 8th Workshop on Linked Data in Linguistics within the 13th Language Resources and Evaluation Conference
This article discusses a survey carried out within the NexusLinguarum COST Action which aimed to give an overview of existing guidelines (GLs) and best practices (BPs) in linguistic linked data. In particular it focused on four core tasks in the production/publication of linked data: generation, interlinking, publication, and validation. We discuss the importance of GLs and BPs for LLD before describing the survey and its results in full. Finally we offer a number of directions for future work in order to address the findings of the survey.
pdf
bib
abs
FII UAIC at SemEval-2022 Task 6: iSarcasmEval - Intended Sarcasm Detection in English and Arabic
Tudor Manoleasa
|
Daniela Gifu
|
Iustin Sandu
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
The “iSarcasmEval - Intended Sarcasm Detection in English and Arabic” task at the SemEval 2022 competition focuses on detectingand rating the distinction between intendedand perceived sarcasm in the context of textual sarcasm detection, as well as the level ofirony contained in these texts. In the contextof SemEval, we present a binary classificationmethod which classifies the text as sarcasticor non-sarcastic (task A, for English) based onfive classical machine learning approaches bytrying to train the models based on this datasetsolely (i.e., no other datasets have been used).This process indicates low performance compared to previously studied datasets, which in2dicates that the previous ones might be biased.
2021
pdf
bib
abs
FII_CROSS at SemEval-2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation
Ciprian Bodnar
|
Andrada Tapuc
|
Cosmin Pintilie
|
Daniela Gifu
|
Diana Trandabat
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
This paper presents a word-in-context disambiguation system. The task focuses on capturing the polysemous nature of words in a multilingual and cross-lingual setting, without considering a strict inventory of word meanings. The system applies Natural Language Processing algorithms on datasets from SemEval 2021 Task 2, being able to identify the meaning of words for the languages Arabic, Chinese, English, French and Russian, without making use of any additional mono- or multilingual resources.
pdf
bib
abs
FII FUNNY at SemEval-2021 Task 7: HaHackathon: Detecting and rating Humor and Offense
Mihai Samson
|
Daniela Gifu
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
The “HaHackathon: Detecting and Rating Humor and Offense” task at the SemEval 2021 competition focuses on detecting and rating the humor level in sentences, as well as the level of offensiveness contained in these texts with humoristic tones. In this paper, we present an approach based on recent Deep Learning techniques by both trying to train the models based on the dataset solely and by trying to fine-tune pre-trained models on the gigantic corpus.
2020
pdf
bib
abs
FII-UAIC at SemEval-2020 Task 9: Sentiment Analysis for Code-Mixed Social Media Text Using CNN
Lavinia Aparaschivei
|
Andrei Palihovici
|
Daniela Gîfu
Proceedings of the Fourteenth Workshop on Semantic Evaluation
The “Sentiment Analysis for Code-Mixed Social Media Text” task at the SemEval 2020 competition focuses on sentiment analysis in code-mixed social media text , specifically, on the combination of English with Spanish (Spanglish) and Hindi (Hinglish). In this paper, we present a system able to classify tweets, from Spanish and English languages, into positive, negative and neutral. Firstly, we built a classifier able to provide corresponding sentiment labels. Besides the sentiment labels, we provide the language labels at the word level. Secondly, we generate a word-level representation, using Convolutional Neural Network (CNN) architecture. Our solution indicates promising results for the Sentimix Spanglish-English task (0.744), the team, Lavinia_Ap, occupied the 9th place. However, for the Sentimix Hindi-English task (0.324) the results have to be improved.
pdf
bib
abs
UAIC1860 at SemEval-2020 Task 11: Detection of Propaganda Techniques in News Articles
Vlad Ermurachi
|
Daniela Gifu
Proceedings of the Fourteenth Workshop on Semantic Evaluation
The “Detection of Propaganda Techniques in News Articles” task at the SemEval 2020 competition focuses on detecting and classifying propaganda, pervasive in news article. In this paper, we present a system able to evaluate on sentence level, three traditional text representation techniques for these study goals, using: tf*idf, word and character n-grams. Firstly, we built a binary classifier able to provide corresponding propaganda labels, propaganda or non-propaganda. Secondly, we build a multilabel multiclass model to identify applied propaganda.
pdf
bib
abs
A Real-Time System for Credibility on Twitter
Adrian Iftene
|
Daniela Gifu
|
Andrei-Remus Miron
|
Mihai-Stefan Dudu
Proceedings of the Twelfth Language Resources and Evaluation Conference
Nowadays, social media credibility is a pressing issue for each of us who are living in an altered online landscape. The speed of news diffusion is striking. Given the popularity of social networks, more and more users began posting pictures, information, and news about personal life. At the same time, they started to use all this information to get informed about what their friends do or what is happening in the world, many of them arousing much suspicion. The problem we are currently experiencing is that we do not currently have an automatic method of figuring out in real-time which news or which users are credible and which are not, what is false or what is true on the Internet. The goal of this is to analyze Twitter in real-time using neural networks in order to provide us key elements about both the credibility of tweets and users who posted them. Thus, we make a real-time heatmap using information gathered from users to create overall images of the areas from which this fake news comes.
pdf
bib
abs
CoBiLiRo: A Research Platform for Bimodal Corpora
Dan Cristea
|
Ionuț Pistol
|
Șerban Boghiu
|
Anca-Diana Bibiri
|
Daniela Gîfu
|
Andrei Scutelnicu
|
Mihaela Onofrei
|
Diana Trandabăț
|
George Bugeag
Proceedings of the 1st International Workshop on Language Technology Platforms
This paper describes the on-going work carried out within the CoBiLiRo (Bimodal Corpus for Romanian Language) research project, part of ReTeRom (Resources and Technologies for Developing Human-Machine Interfaces in Romanian). Data annotation finds increasing use in speech recognition and synthesis with the goal to support learning processes. In this context, a variety of different annotation systems for application to Speech and Text Processing environments have been presented. Even if many designs for the data annotations workflow have emerged, the process of handling metadata, to manage complex user-defined annotations, is not covered enough. We propose a design of the format aimed to serve as an annotation standard for bimodal resources, which facilitates searching, editing and statistical analysis operations over it. The design and implementation of an infrastructure that houses the resources are also presented. The goal is widening the dissemination of bimodal corpora for research valorisation and use in applications. Also, this study reports on the main operations of the web Platform which hosts the corpus and the automatic conversion flows that brings the submitted files at the format accepted by the Platform.
2019
pdf
bib
abs
Hope at SemEval-2019 Task 6: Mining social media language to discover offensive language
Gabriel Florentin Patras
|
Diana Florina Lungu
|
Daniela Gifu
|
Diana Trandabat
Proceedings of the 13th International Workshop on Semantic Evaluation
User’s content share through social media has reached huge proportions nowadays. However, along with the free expression of thoughts on social media, people risk getting exposed to various aggressive statements. In this paper, we present a system able to identify and classify offensive user-generated content.
2018
pdf
bib
abs
EmoIntens Tracker at SemEval-2018 Task 1: Emotional Intensity Levels in #Tweets
Ramona-Andreea Turcu
|
Sandra Maria Amarandei
|
Iuliana-Alexandra Flescan-Lovin-Arseni
|
Daniela Gifu
|
Diana Trandabat
Proceedings of the 12th International Workshop on Semantic Evaluation
The „Affect in Tweets” task is centered on emotions categorization and evaluation matrix using multi-language tweets (English and Spanish). In this research, SemEval Affect dataset was preprocessed, categorized, and evaluated accordingly (precision, recall, and accuracy). The system described in this paper is based on the implementation of supervised machine learning (Naive Bayes, KNN and SVM), deep learning (NN Tensor Flow model), and decision trees algorithms.
pdf
bib
abs
The Dabblers at SemEval-2018 Task 2: Multilingual Emoji Prediction
Larisa Alexa
|
Alina Lorenț
|
Daniela Gîfu
|
Diana Trandabăț
Proceedings of the 12th International Workshop on Semantic Evaluation
The “Multilingual Emoji Prediction” task focuses on the ability of predicting the correspondent emoji for a certain tweet. In this paper, we investigate the relation between words and emojis. In order to do that, we used supervised machine learning (Naive Bayes) and deep learning (Recursive Neural Network).
pdf
bib
abs
Apollo at SemEval-2018 Task 9: Detecting Hypernymy Relations Using Syntactic Dependencies
Mihaela Onofrei
|
Ionuț Hulub
|
Diana Trandabăț
|
Daniela Gîfu
Proceedings of the 12th International Workshop on Semantic Evaluation
This paper presents the participation of Apollo’s team in the SemEval-2018 Task 9 “Hypernym Discovery”, Subtask 1: “General-Purpose Hypernym Discovery”, which tries to produce a ranked list of hypernyms for a specific term. We propose a novel approach for automatic extraction of hypernymy relations from a corpus by using dependency patterns. We estimated that the application of these patterns leads to a higher score than using the traditional lexical patterns.
2014
pdf
bib
abs
Transliteration and alignment of parallel texts from Cyrillic to Latin
Mircea Petic
|
Daniela Gîfu
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
This article describes a methodology of recovering and preservation of old Romanian texts and problems related to their recognition. Our focus is to create a gold corpus for Romanian language (the novella Sania), for both alphabets used in Transnistria ― Cyrillic and Latin. The resource is available for similar researches. This technology is based on transliteration and semiautomatic alignment of parallel texts at the level of letter/lexem/multiwords. We have analysed every text segment present in this corpus and discovered other conventions of writing at the level of transliteration, academic norms and editorial interventions. These conventions allowed us to elaborate and implement some new heuristics that make a correct automatic transliteration process. Sometimes the words of Latin script are modified in Cyrillic script from semantic reasons (for instance, editor’s interpretation). Semantic transliteration is seen as a good practice in introducing multiwords from Cyrillic to Latin. Not only does it preserve how a multiwords sound in the source script, but also enables the translator to modify in the original text (here, choosing the most common sense of an expression). Such a technology could be of interest to lexicographers, but also to specialists in computational linguistics to improve the actual transliteration standards.