Stefan Trausan-Matu

Also published as: Ștefan Trăușan-Matu

2025

MultiMatch: Multihead Consistency Regularization Matching for Semi-Supervised Text Classification
Iustin Sirbu | Robert-Adrian Popovici | Cornelia Caragea | Stefan Trausan-Matu | Traian Rebedea
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

We introduce **MultiMatch**, a novel semi-supervised learning (SSL) algorithm combining the paradigms of co-training and consistency regularization with pseudo-labeling. At its core, MultiMatch features a three-fold pseudo-label weighting module designed for selecting and filtering pseudo-labels based on head agreement and model confidence, and weighting them according to the perceived classification difficulty. This novel module enhances and unifies three existing techniques - heads agreement from **Multi**head Co-training, self-adaptive thresholds from Free**Match**, and Average Pseudo-Margins from Margin**Match** - resulting in a holistic approach that improves robustness and performance in SSL settings.Experimental results on benchmark datasets highlight the superior performance of MultiMatch, i.e., MultiMatch achieves state-of-the-art results on 8 out of 10 setups from 5 natural language processing datasets and ranks first according to the Friedman test among 21 methods. Furthermore, MultiMatch demonstrates exceptional robustness in highly imbalanced settings, outperforming the second-best approach by 3.26%, a critical advantage for real-world text classification tasks. Our code is available on GitHub.

2020

pdf bib abs

UPB at SemEval-2020 Task 8: Joint Textual and Visual Modeling in a Multi-Task Learning Architecture for Memotion Analysis
George-Alexandru Vlad | George-Eduard Zaharia | Dumitru-Clementin Cercel | Costin Chiru | Stefan Trausan-Matu
Proceedings of the Fourteenth Workshop on Semantic Evaluation

Users from the online environment can create different ways of expressing their thoughts, opinions, or conception of amusement. Internet memes were created specifically for these situations. Their main purpose is to transmit ideas by using combinations of images and texts such that they will create a certain state for the receptor, depending on the message the meme has to send. These posts can be related to various situations or events, thus adding a funny side to any circumstance our world is situated in. In this paper, we describe the system developed by our team for SemEval-2020 Task 8: Memotion Analysis. More specifically, we introduce a novel system to analyze these posts, a multimodal multi-task learning architecture that combines ALBERT for text encoding with VGG-16 for image representation. In this manner, we show that the information behind them can be properly revealed. Our approach achieves good performance on each of the three subtasks of the current competition, ranking 11th for Subtask A (0.3453 macro F1-score), 1st for Subtask B (0.5183 macro F1-score), and 3rd for Subtask C (0.3171 macro F1-score) while exceeding the official baseline results by high margins.

2019

pdf bib abs

Building a Comprehensive Romanian Knowledge Base for Drug Administration
Bogdan Nicula | Mihai Dascalu | Maria-Dorinela Sîrbu | Ștefan Trăușan-Matu | Alexandru Nuță
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Information on drug administration is obtained traditionally from doctors and pharmacists, as well as leaflets which provide in most cases cumbersome and hard-to-follow details. Thus, the need for medical knowledge bases emerges to provide access to concrete and well-structured information which can play an important role in informing patients. This paper introduces a Romanian medical knowledge base focused on drug-drug interactions, on representing relevant drug information, and on symptom-disease relations. The knowledge base was created by extracting and transforming information using Natural Language Processing techniques from both structured and unstructured sources, together with manual annotations. The resulting Romanian ontologies are aligned with larger English medical ontologies. Our knowledge base supports queries regarding drugs (e.g., active ingredients, concentration, expiration date), drug-drug interaction, symptom-disease relations, as well as drug-symptom relations.

pdf bib abs

SC-UPB at the VarDial 2019 Evaluation Campaign: Moldavian vs. Romanian Cross-Dialect Topic Identification
Cristian Onose | Dumitru-Clementin Cercel | Stefan Trausan-Matu
Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects

This paper describes our models for the Moldavian vs. Romanian Cross-Topic Identification (MRC) evaluation campaign, part of the VarDial 2019 workshop. We focus on the three subtasks for MRC: binary classification between the Moldavian (MD) and the Romanian (RO) dialects and two cross-dialect multi-class classification between six news topics, MD to RO and RO to MD. We propose several deep learning models based on long short-term memory cells, Bidirectional Gated Recurrent Unit (BiGRU) and Hierarchical Attention Networks (HAN). We also employ three word embedding models to represent the text as a low dimensional vector. Our official submission includes two runs of the BiGRU and HAN models for each of the three subtasks. The best submitted model obtained the following macro-averaged F1 scores: 0.708 for subtask 1, 0.481 for subtask 2 and 0.480 for the last one. Due to a read error caused by the quoting behaviour over the test file, our final submissions contained a smaller number of items than expected. More than 50% of the submission files were corrupted. Thus, we also present the results obtained with the corrected labels for which the HAN model achieves the following results: 0.930 for subtask 1, 0.590 for subtask 2 and 0.687 for the third one.

2017

pdf bib abs

oIQa: An Opinion Influence Oriented Question Answering Framework with Applications to Marketing Domain
Dumitru-Clementin Cercel | Cristian Onose | Stefan Trausan-Matu | Florin Pop
Proceedings of the 1st Workshop on Natural Language Processing and Information Retrieval associated with RANLP 2017

Understanding questions and answers in QA system is a major challenge in the domain of natural language processing. In this paper, we present a question answering system that influences the human opinions in a conversation. The opinion words are quantified by using a lexicon-based method. We apply Latent Semantic Analysis and the cosine similarity measure between candidate answers and each question to infer the answer of the chatbot.