Massimo Piccardi - ACL Anthology

Massimo Piccardi

2026

Attribute-Controlled Translation with Preference Optimization
Inigo Jauregi Unanue | Najmeh Sadoughi | Vimal Bhat | Zhu Liu | Massimo Piccardi
Findings of the Association for Computational Linguistics: EACL 2026

Attribute-controlled translation (ACT) seeks to produce translations that satisfy specific constraints on linguistic and stylistic attributes. While careful prompt engineering can enable large language models to perform strongly in this task, its effectiveness is mainly limited to models of very large size. For this reason, in this paper we set to improve the performance of language models of more contained size by leveraging the contrastive nature of ACT tasks with preference optimization, as well as exploiting knowledge distillation with synthetically-generated training samples from larger models. As a resource for this investigation, we also introduce PREF-FAME-MT, a large, contrastive, formality-controlled parallel corpus which has been generated by expanding the existing FAME-MT dataset with synthetic contrastive samples. Experiments conducted over three datasets for formality- and gender-controlled translation with 71 distinct language pairs have demonstrated the effectiveness of the proposed approach at simultaneously improving attribute matching and translation quality. We release all our code and datasets to allow reproduction and expansion of our work.

2024

Improving Vietnamese-English Medical Machine Translation
Nhu Vo | Dat Quoc Nguyen | Dung D. Le | Massimo Piccardi | Wray Buntine
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Machine translation for Vietnamese-English in the medical domain is still an under-explored research area. In this paper, we introduce MedEV—a high-quality Vietnamese-English parallel dataset constructed specifically for the medical domain, comprising approximately 360K sentence pairs. We conduct extensive experiments comparing Google Translate, ChatGPT (gpt-3.5-turbo), state-of-the-art Vietnamese-English neural machine translation models and pre-trained bilingual/multilingual sequence-to-sequence models on our new MedEV dataset. Experimental results show that the best performance is achieved by fine-tuning “vinai-translate” for each translation direction. We publicly release our dataset to promote further research.

XVD: Cross-Vocabulary Differentiable Training for Generative Adversarial Attacks
Tom Roth | Inigo Jauregi Unanue | Alsharif Abuadbba | Massimo Piccardi
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

An adversarial attack to a text classifier consists of an input that induces the classifier into an incorrect class prediction, while retaining all the linguistic properties of correctly-classified examples. A popular class of adversarial attacks exploits the gradients of the victim classifier to train a dedicated generative model to produce effective adversarial examples. However, this training signal alone is not sufficient to ensure other desirable properties of the adversarial attacks, such as similarity to non-adversarial examples, linguistic fluency, grammaticality, and so forth. For this reason, in this paper we propose a novel training objective which leverages a set of pretrained language models to promote such properties in the adversarial generation. A core component of our approach is a set of vocabulary-mapping matrices which allow cascading the generative model to any victim or component model of choice, while retaining differentiability end-to-end. The proposed approach has been tested in an ample set of experiments covering six text classification datasets, two victim models, and four baselines. The results show that it has been able to produce effective adversarial attacks, outperforming the compared generative approaches in a majority of cases and proving highly competitive against established token-replacement approaches.

SumTra: A Differentiable Pipeline for Few-Shot Cross-Lingual Summarization
Jacob Parnell | Inigo Jauregi Unanue | Massimo Piccardi
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Cross-lingual summarization (XLS) generates summaries in a language different from that of the input documents (e.g., English to Spanish), allowing speakers of the target language to gain a concise view of their content. In the present day, the predominant approach to this task is to take a performing, pretrained multilingual language model (LM) and fine-tune it for XLS on the language pairs of interest. However, the scarcity of fine-tuning samples makes this approach challenging in some cases. For this reason, in this paper we propose revisiting the summarize-and-translate pipeline, where the summarization and translation tasks are performed in a sequence. This approach allows reusing the many, publicly-available resources for monolingual summarization and translation, obtaining a very competitive zero-shot performance. In addition, the proposed pipeline is completely differentiable end-to-end, allowing it to take advantage of few-shot fine-tuning, where available. Experiments over two contemporary and widely adopted XLS datasets (CrossSum and WikiLingua) have shown the remarkable zero-shot performance of the proposed approach, and also its strong few-shot performance compared to an equivalent multilingual LM baseline, that the proposed approach has been able to outperform in many languages with only 10% of the fine-tuning samples.

2023

T3L: Translate-and-Test Transfer Learning for Cross-Lingual Text Classification
Inigo Jauregi Unanue | Gholamreza Haffari | Massimo Piccardi
Transactions of the Association for Computational Linguistics, Volume 11

Cross-lingual text classification leverages text classifiers trained in a high-resource language to perform text classification in other languages with no or minimal fine-tuning (zero/ few-shots cross-lingual transfer). Nowadays, cross-lingual text classifiers are typically built on large-scale, multilingual language models (LMs) pretrained on a variety of languages of interest. However, the performance of these models varies significantly across languages and classification tasks, suggesting that the superposition of the language modelling and classification tasks is not always effective. For this reason, in this paper we propose revisiting the classic “translate-and-test” pipeline to neatly separate the translation and classification stages. The proposed approach couples 1) a neural machine translator translating from the targeted language to a high-resource language, with 2) a text classifier trained in the high-resource language, but the neural machine translator generates “soft” translations to permit end-to-end backpropagation during fine-tuning of the pipeline. Extensive experiments have been carried out over three cross-lingual text classification datasets (XNLI, MLDoc, and MultiEURLEX), with the results showing that the proposed approach has significantly improved performance over a competitive baseline.

2022

A Multi-Document Coverage Reward for RELAXed Multi-Document Summarization
Jacob Parnell | Inigo Jauregi Unanue | Massimo Piccardi
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Multi-document summarization (MDS) has made significant progress in recent years, in part facilitated by the availability of new, dedicated datasets and capacious language models. However, a standing limitation of these models is that they are trained against limited references and with plain maximum-likelihood objectives. As for many other generative tasks, reinforcement learning (RL) offers the potential to improve the training of MDS models; yet, it requires a carefully-designed reward that can ensure appropriate leverage of both the reference summaries and the input documents. For this reason, in this paper we propose fine-tuning an MDS baseline with a reward that balances a reference-based metric such as ROUGE with coverage of the input documents. To implement the approach, we utilize RELAX (Grathwohl et al., 2018), a contemporary gradient estimator which is both low-variance and unbiased, and we fine-tune the baseline in a few-shot style for both stability and computational efficiency. Experimental results over the Multi-News and WCEP MDS datasets show significant improvements of up to +0.95 pp average ROUGE score and +3.17 pp METEOR score over the baseline, and competitive results with the literature. In addition, they show that the coverage of the input documents is increased, and evenly across all documents.

2021

BERTTune: Fine-Tuning Neural Machine Translation with BERTScore
Inigo Jauregi Unanue | Jacob Parnell | Massimo Piccardi
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Neural machine translation models are often biased toward the limited translation references seen during training. To amend this form of overfitting, in this paper we propose fine-tuning the models with a novel training objective based on the recently-proposed BERTScore evaluation metric. BERTScore is a scoring function based on contextual embeddings that overcomes the typical limitations of n-gram-based metrics (e.g. synonyms, paraphrases), allowing translations that are different from the references, yet close in the contextual embedding space, to be treated as substantially correct. To be able to use BERTScore as a training objective, we propose three approaches for generating soft predictions, allowing the network to remain completely differentiable end-to-end. Experiments carried out over four, diverse language pairs show improvements of up to 0.58 pp (3.28%) in BLEU score and up to 0.76 pp (0.98%) in BERTScore (F_BERT) when fine-tuning a strong baseline.

Improving Adversarial Text Generation with n-Gram Matching
Shijie Li | Massimo Piccardi
Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation

RewardsOfSum: Exploring Reinforcement Learning Rewards for Summarisation
Jacob Parnell | Inigo Jauregi Unanue | Massimo Piccardi
Proceedings of the 5th Workshop on Structured Prediction for NLP (SPNLP 2021)

To date, most abstractive summarisation models have relied on variants of the negative log-likelihood (NLL) as their training objective. In some cases, reinforcement learning has been added to train the models with an objective that is closer to their evaluation measures (e.g. ROUGE). However, the reward function to be used within the reinforcement learning approach can play a key role for performance and is still partially unexplored. For this reason, in this paper, we propose two reward functions for the task of abstractive summarisation: the first function, referred to as RwB-Hinge, dynamically selects the samples for the gradient update. The second function, nicknamed RISK, leverages a small pool of strong candidates to inform the reward. In the experiments, we probe the proposed approach by fine-tuning an NLL pre-trained model over nine summarisation datasets of diverse size and nature. The experimental results show a consistent improvement over the negative log-likelihood baselines.

2020

Leveraging Discourse Rewards for Document-Level Neural Machine Translation
Inigo Jauregi Unanue | Nazanin Esmaili | Gholamreza Haffari | Massimo Piccardi
Proceedings of the 28th International Conference on Computational Linguistics

Document-level machine translation focuses on the translation of entire documents from a source to a target language. It is widely regarded as a challenging task since the translation of the individual sentences in the document needs to retain aspects of the discourse at document level. However, document-level translation models are usually not trained to explicitly ensure discourse quality. Therefore, in this paper we propose a training approach that explicitly optimizes two established discourse metrics, lexical cohesion and coherence, by using a reinforcement learning objective. Experiments over four different language pairs and three translation domains have shown that our training approach has been able to achieve more cohesive and coherent document translations than other competitive approaches, yet without compromising the faithfulness to the reference translation. In the case of the Zh-En language pair, our method has achieved an improvement of 2.46 percentage points (pp) in LC and 1.17 pp in COH over the runner-up, while at the same time improving 0.63 pp in BLEU score and 0.47 pp in F-BERT.

Controlled Text Generation with Adversarial Learning
Federico Betti | Giorgia Ramponi | Massimo Piccardi
Proceedings of the 13th International Conference on Natural Language Generation

In recent years, generative adversarial networks (GANs) have started to attain promising results also in natural language generation. However, the existing models have paid limited attention to the semantic coherence of the generated sentences. For this reason, in this paper we propose a novel network – the Controlled TExt generation Relational Memory GAN (CTERM-GAN) – that uses an external input to influence the coherence of sentence generation. The network is composed of three main components: a generator based on a Relational Memory conditioned on the external input; a syntactic discriminator which learns to discriminate between real and generated sentences; and a semantic discriminator which assesses the coherence with the external conditioning. Our experiments on six probing datasets have showed that the model has been able to achieve interesting results, retaining or improving the syntactic quality of the generated sentences while significantly improving their semantic coherence with the given input.

Machine translation of scientific abstracts and terminologies has the potential to support health professionals and biomedical researchers in some of their activities. In the fifth edition of the WMT Biomedical Task, we addressed a total of eight language pairs. Five language pairs were previously addressed in past editions of the shared task, namely, English/German, English/French, English/Spanish, English/Portuguese, and English/Chinese. Three additional languages pairs were also introduced this year: English/Russian, English/Italian, and English/Basque. The task addressed the evaluation of both scientific abstracts (all language pairs) and terminologies (English/Basque only). We received submissions from a total of 20 teams. For recurring language pairs, we observed an improvement in the translations in terms of automatic scores and qualitative evaluations, compared to previous years.

Pretrained Language Models and Backtranslation for English-Basque Biomedical Neural Machine Translation
Inigo Jauregi Unanue | Massimo Piccardi
Proceedings of the Fifth Conference on Machine Translation

This paper describes the machine translation systems proposed by the University of Technology Sydney Natural Language Processing (UTS_NLP) team for the WMT20 English-Basque biomedical translation tasks. Due to the limited parallel corpora available, we have proposed to train a BERT-fused NMT model that leverages the use of pretrained language models. Furthermore, we have augmented the training corpus by backtranslating monolingual data. Our experiments show that NMT models in low-resource scenarios can benefit from combining these two training techniques, with improvements of up to 6.16 BLEU percentual points in the case of biomedical abstract translations.

2019

ReWE: Regressing Word Embeddings for Regularization of Neural Machine Translation Systems
Inigo Jauregi Unanue | Ehsan Zare Borzeshi | Nazanin Esmaili | Massimo Piccardi
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Regularization of neural machine translation is still a significant problem, especially in low-resource settings. To mollify this problem, we propose regressing word embeddings (ReWE) as a new regularization technique in a system that is jointly trained to predict the next word in the translation (categorical value) and its word embedding (continuous value). Such a joint training allows the proposed system to learn the distributional properties represented by the word embeddings, empirically improving the generalization to unseen sentences. Experiments over three translation datasets have showed a consistent improvement over a strong baseline, ranging between 0.91 and 2.4 BLEU points, and also a marked improvement over a state-of-the-art system.

Proceedings of the 17th Annual Workshop of the Australasian Language Technology Association
Meladel Mistica | Massimo Piccardi | Andrew MacKinlay
Proceedings of the 17th Annual Workshop of the Australasian Language Technology Association

A multi-constraint structured hinge loss for named-entity recognition
Hanieh Poostchi | Massimo Piccardi
Proceedings of the 17th Annual Workshop of the Australasian Language Technology Association

2018

English-Basque Statistical and Neural Machine Translation
Inigo Jauregi Unanue | Lierni Garmendia Arratibel | Ehsan Zare Borzeshi | Massimo Piccardi
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

BiLSTM-CRF for Persian Named-Entity Recognition ArmanPersoNERCorpus: the First Entity-Annotated Persian Dataset
Hanieh Poostchi | Ehsan Zare Borzeshi | Massimo Piccardi
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Cluster Labeling by Word Embeddings and WordNet's Hypernymy
Hanieh Poostchi | Massimo Piccardi
Proceedings of the Australasian Language Technology Association Workshop 2018

Cluster labeling is the assignment of representative labels to clusters obtained from the organization of a document collection. Once assigned, the labels can play an important role in applications such as navigation, search and document classification. However, finding appropriately descriptive labels is still a challenging task. In this paper, we propose various approaches for assigning labels to word clusters by leveraging word embeddings and the synonymity and hypernymy relations in the WordNet lexical ontology. Experiments carried out using the WebAP document dataset have shown that one of the approaches stand out in the comparison and is capable of selecting labels that are reasonably aligned with those chosen by a pool of four human annotators.

A Shared Attention Mechanism for Interpretation of Neural Automatic Post-Editing Systems
Inigo Jauregi Unanue | Ehsan Zare Borzeshi | Massimo Piccardi
Proceedings of the 2nd Workshop on Neural Machine Translation and Generation

Automatic post-editing (APE) systems aim to correct the systematic errors made by machine translators. In this paper, we propose a neural APE system that encodes the source (src) and machine translated (mt) sentences with two separate encoders, but leverages a shared attention mechanism to better understand how the two inputs contribute to the generation of the post-edited (pe) sentences. Our empirical observations have showed that when the mt is incorrect, the attention shifts weight toward tokens in the src sentence to properly edit the incorrect translation. The model has been trained and evaluated on the official data from the WMT16 and WMT17 APE IT domain English-German shared tasks. Additionally, we have used the extra 500K artificial data provided by the shared task. Our system has been able to reproduce the accuracies of systems trained with the same data, while at the same time providing better interpretability.

2016

PersoNER: Persian Named-Entity Recognition
Hanieh Poostchi | Ehsan Zare Borzeshi | Mohammad Abdous | Massimo Piccardi
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Named-Entity Recognition (NER) is still a challenging task for languages with low digital resources. The main difficulties arise from the scarcity of annotated corpora and the consequent problematic training of an effective NER pipeline. To abridge this gap, in this paper we target the Persian language that is spoken by a population of over a hundred million people world-wide. We first present and provide ArmanPerosNERCorpus, the first manually-annotated Persian NER corpus. Then, we introduce PersoNER, an NER pipeline for Persian that leverages a word embedding and a sequential max-margin classifier. The experimental results show that the proposed approach is capable of achieving interesting MUC7 and CoNNL scores while outperforming two alternatives based on a CRF and a recurrent neural network.

Bidirectional LSTM-CRF for Clinical Concept Extraction
Raghavendra Chalapathy | Ehsan Zare Borzeshi | Massimo Piccardi
Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)

Automated extraction of concepts from patient clinical records is an essential facilitator of clinical research. For this reason, the 2010 i2b2/VA Natural Language Processing Challenges for Clinical Records introduced a concept extraction task aimed at identifying and classifying concepts into predefined categories (i.e., treatments, tests and problems). State-of-the-art concept extraction approaches heavily rely on handcrafted features and domain-specific resources which are hard to collect and define. For this reason, this paper proposes an alternative, streamlined approach: a recurrent neural network (the bidirectional LSTM with CRF decoding) initialized with general-purpose, off-the-shelf word embeddings. The experimental results achieved on the 2010 i2b2/VA reference corpora using the proposed framework outperform all recent methods and ranks closely to the best submission from the original 2010 i2b2/VA challenge.

An Investigation of Recurrent Neural Architectures for Drug Name Recognition
Raghavendra Chalapathy | Ehsan Zare Borzeshi | Massimo Piccardi
Proceedings of the Seventh International Workshop on Health Text Mining and Information Analysis