Alessandro Moschitti

2025

pdf bib abs
Retrieving Support to Rank Answers in Open-Domain Question Answering
Zeyu Zhang | Alessandro Moschitti | Thuy Vu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

We introduce a novel Question Answering (QA) architecture that enhances answer selection by retrieving targeted supporting evidence. Unlike traditional methods, which retrieve documents or passages relevant only to a query q, our approach retrieves content relevant to the combined pair (q, a), explicitly emphasizing the supporting relation between the query and a candidate answer a. By prioritizing this relational context, our model effectively identifies paragraphs that directly substantiate the correctness of a with respect to q, leading to more accurate answer verification than standard retrieval systems. Our neural retrieval method also scales efficiently to collections containing hundreds of millions of paragraphs. Moreover, this approach can be used by large language models (LLMs) to retrieve explanatory paragraphs that ground their reasoning, enabling them to tackle more complex QA tasks with greater reliability and interpretability.

pdf bib abs
Tuning-Free Personalized Alignment via Trial-Error-Explain In-Context Learning
Hyundong Justin Cho | Karishma Sharma | Nicolaas Paul Jedema | Leonardo F. R. Ribeiro | Jonathan May | Alessandro Moschitti
Findings of the Association for Computational Linguistics: NAACL 2025

Language models are aligned to the collective voice of many, resulting in generic outputs that do not align with specific users’ styles. In this work, we present Trial-Error-Explain In-Context Learning (TICL), a tuning-free method that personalizes language models for text generation tasks with fewer than 10 examples per user. TICL iteratively expands an in-context learning prompt via a trial-error-explain process, adding model-generated negative samples and explanations that provide fine-grained guidance towards a specific user’s style. TICL achieves favorable win rates on pairwise comparisons with LLM-as-a-judge up to 91.5% against the previous state-of-the-art and outperforms competitive tuning-free baselines for personalized alignment tasks of writing emails, essays and news articles. Both lexical and qualitative analyses show that the negative samples and explanations enable language models to learn stylistic context more effectively and overcome the bias towards structural and formal phrases observed in their zero-shot outputs. By front-loading inference compute to create a user-specific in-context learning prompt that does not require extra generation steps at test time, presents a novel yet simple approach for personalized alignment.

pdf bib abs
Improving Document Retrieval Coherence for Semantically Equivalent Queries
Stefano Campese | Alessandro Moschitti | Ivano Lauriola
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

Dense Retrieval (DR) models have proven to be effective for Document Retrieval and Information Grounding tasks. Usually, these models are trained and optimized for improving the relevance of top-ranked documents for a given query. Previous work has shown that popular DR models are sensitive to the query and document lexicon: small variations of it may lead to a significant difference in the set of retrieved documents. In this paper, we propose a variation of the Multi-Negative Ranking loss for training DR that improves the coherence of models in retrieving the same documents with respect to semantically similar queries. The loss penalizes discrepancies between the top-k ranked documents retrieved for diverse but semantically equivalent queries. We conducted extensive experiments on various datasets, MS-MARCO, Natural Questions, BEIR, and TREC DL 19/20. The results show that (i) models optimizes by our loss are subject to lower sensitivity, and, (ii) interestingly, higher accuracy.

pdf bib abs
Analyzing and Improving Coherence of Large Language Models in Question Answering
Ivano Lauriola | Stefano Campese | Alessandro Moschitti
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Large language models (LLMs) have recently revolutionized natural language processing. These models, however, often suffer from instability or lack of coherence, that is the ability of the models to generate semantically equivalent outputs when receiving diverse yet semantically equivalent input variations. In this work, we analyze the behavior of multiple LLMs, including Mixtral-8x7B, Llama2-70b, Smaug-72b, and Phi-3, when dealing with multiple lexical variations of the same info-seeking questions. Our results suggest that various LLMs struggle to consistently answer diverse equivalent queries. To address this issue, we show how redundant information encoded as a prompt can increase the coherence of these models. In addition, we introduce a Retrieval-Augmented Generation (RAG) technique that supplements LLMs with the top-k most similar questions from a question retrieval engine. This knowledge-augmentation leads to 4-8 percentage point improvement in end-to-end performance in factual question answering tasks. These findings underscore the need to enhance LLM stability and coherence through semantic awareness.

2024

pdf bib abs
Pre-Training Methods for Question Reranking
Stefano Campese | Ivano Lauriola | Alessandro Moschitti
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)

One interesting approach to Question Answering (QA) is to search for semantically similar questions, which have been answered before. This task is different from answer retrieval as it focuses on questions rather than only on the answers, therefore it requires different model training on different data.In this work, we introduce a novel unsupervised pre-training method specialized for retrieving and ranking questions. This leverages (i) knowledge distillation from a basic question retrieval model, and (ii) new pre-training task and objective for learning to rank questions in terms of their relevance with the query. Our experiments show that (i) the proposed technique achieves state-of-the-art performance on QRC and Quora-match datasets, and (ii) the benefit of combining re-ranking and retrieval models.

Current instruction-tuned language models are exclusively trained with textual preference data and thus may not be aligned to the unique requirements of other modalities, such as speech. To better align language models with the speech domain, we explore i) prompting strategies based on radio-industry best practices and ii) preference learning using a novel speech-based preference data of 20K samples collected by annotators who listen to response pairs. Both human and automatic evaluation show that both prompting and preference learning increase the speech-suitability of popular instruction tuned LLMs. More interestingly, we show that these methods are additive; combining them achieves the best win rates in head-to-head comparison, resulting in responses that are preferred or tied to the base model in 76.2% of comparisons on average. Lastly, we share lexical, syntactical, and qualitative analyses that elicit how our studied methods differ with baselines in generating more speech-suitable responses.

pdf bib abs
Measuring Retrieval Complexity in Question Answering Systems
Matteo Gabburo | Nicolaas Paul Jedema | Siddhant Garg | Leonardo F. R. Ribeiro | Alessandro Moschitti
Findings of the Association for Computational Linguistics: ACL 2024

In this paper, we investigate which questions are challenging for retrieval-based Question Answering (QA). We (i) propose retrieval complexity (RC), a novel metric conditioned on the completeness of retrieved documents, which measures the difficulty of answering questions, and (ii) propose an unsupervised pipeline to measure RC given an arbitrary retrieval system.Our proposed pipeline measures RC more accurately than alternative estimators, including LLMs, on six challenging QA benchmarks. Further investigation reveals that RC scores strongly correlate with both QA performance and expert judgment across five of the six studied benchmarks, indicating that RC is an effective measure of question difficulty.Subsequent categorization of high-RC questions shows that they span a broad set of question shapes, including multi-hop, compositional, and temporal QA, indicating that RC scores can categorize a new subset of complex questions. Our system can also have a major impact on retrieval-based systems by helping to identify more challenging questions on existing datasets.

pdf bib abs
Datasets for Multilingual Answer Sentence Selection
Matteo Gabburo | Stefano Campese | Federico Agostini | Alessandro Moschitti
Findings of the Association for Computational Linguistics: EMNLP 2024

Answer Sentence Selection (AS2) is a critical task for designing effective retrieval-based Question Answering (QA) systems. Most advancements in AS2 focus on English due to the scarcity of annotated datasets for other languages. This lack of resources prevents the training of effective AS2 models in different languages, creating a performance gap between QA systems in English and other locales. In this paper, we introduce new high-quality datasets for AS2 in five European languages (French, German, Italian, Portuguese, and Spanish), obtained through supervised Automatic Machine Translation (AMT) of existing English AS2 datasets such as ASNQ, WikiQA, and TREC-QA using a Large Language Model (LLM). We evaluated our approach and the quality of the translated datasets through multiple experiments with different Transformer architectures. The results indicate that our datasets are pivotal in producing robust and powerful multilingual AS2 models, significantly contributing to closing the performance gap between English and other languages.

pdf bib abs
Efficient and Accurate Contextual Re-Ranking for Knowledge Graph Question Answering
Kexuan Sun | Nicolaas Paul Jedema | Karishma Sharma | Ruben Janssen | Jay Pujara | Pedro Szekely | Alessandro Moschitti
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The efficacy of neural “retrieve and generate” systems is well established for question answering (QA) over unstructured text. Recent efforts seek to extend this approach to knowledge graph (KG) QA by converting structured triples to unstructured text. However, the relevance of KG triples retrieved by these systems limits their accuracy. In this paper, we improve the relevance of retrieved triples using a carefully designed re-ranker. Specifically, our pipeline (i) retrieves over documents of triples grouped by entity, (ii) re-ranks triples from these documents with context: triples in the 1-hop neighborhood of the documents’ subject entity, and (iii) generates an answer from highly relevant re-ranked triples. To train our re-ranker, we propose a novel “triple-level” labeling strategy that infers fine-grained labels and shows that these significantly improve the relevance of retrieved information. We show that the resulting “retrieve, re-rank, and generate” pipeline significantly improves upon prior KGQA systems, achieving a new state-of-the-art on FreebaseQA by 5.56% Exact Match. We perform multiple ablations that reveal the distinct benefits of our contextual re-ranker and labeling strategy and conclude with a case study that highlights opportunities for future works.

2023

pdf bib abs
Learning Answer Generation using Supervision from Automatic Question Answering Evaluators
Matteo Gabburo | Siddhant Garg | Rik Koncel-Kedziorski | Alessandro Moschitti
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent studies show that sentence-level extractive QA, i.e., based on Answer Sentence Selection (AS2), is outperformed by Generation-based QA (GenQA) models, which generate answers using the top-k answer sentences ranked by AS2 models (a la retrieval-augmented generation style). In this paper, we propose a novel training paradigm for GenQA using supervision from automatic QA evaluation models (GAVA). Specifically, we propose three strategies to transfer knowledge from these QA evaluation models to a GenQA model: (i) augmenting training data with answers generated by the GenQA model and labelled by GAVA (either statically, before training, or (ii) dynamically, at every training epoch); and (iii) using the GAVA score for weighting the generator loss during the learning of the GenQA model. We evaluate our proposed methods on two academic and one industrial dataset, obtaining a significant improvement in answering accuracy over the previous state of the art.

pdf bib abs
Context-Aware Transformer Pre-Training for Answer Sentence Selection
Luca Di Liello | Siddhant Garg | Alessandro Moschitti
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Answer Sentence Selection (AS2) is a core component for building an accurate Question Answering pipeline. AS2 models rank a set of candidate sentences based on how likely they answer a given question. The state of the art in AS2 exploits pre-trained transformers by transferring them on large annotated datasets, while using local contextual information around the candidate sentence. In this paper, we propose three pre-training objectives designed to mimic the downstream fine-tuning task of contextual AS2. This allows for specializing LMs when fine-tuning for contextual AS2. Our experiments on three public and two large-scale industrial datasets show that our pre-training approaches (applied to RoBERTa and ELECTRA) can improve baseline contextual AS2 accuracy by up to 8% on some datasets.

pdf bib abs
Accurate Training of Web-based Question Answering Systems with Feedback from Ranked Users
Liang Wang | Ivano Lauriola | Alessandro Moschitti
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)

Recent work has shown that large-scale annotated datasets are essential for training state-of-the-art Question Answering (QA) models. Unfortunately, creating this data is expensive and requires a huge amount of annotation work. An alternative and cheaper source of supervision is given by feedback data collected from deployed QA systems. This data can be collected from tens of millions of user with no additional cost, for real-world QA services, e.g., Alexa, Google Home, and etc. The main drawback is the noise affecting feedback on individual examples. Recent literature on QA systems has shown the benefit of training models even with noisy feedback. However, these studies have multiple limitations: (i) they used uniform random noise to simulate feedback responses, which is typically an unrealistic approximation as noise follows specific patterns, depending on target examples and users; and (ii) they do not show how to aggregate feedback for improving training signals. In this paper, we first collect a large scale (16M) QA dataset with real feedback sampled from the QA traffic of a popular Virtual Assistant.Second, we use this data to develop two strategies for filtering unreliable users and thus de-noise feedback: (i) ranking users with an automatic classifier, and (ii) aggregating feedback over similar instances and comparing users between each other. Finally, we train QA models on our filtered feedback data, showing a significant improvement over the state of the art.

pdf bib abs
Question-Answer Sentence Graph for Joint Modeling Answer Selection
Roshni Iyer | Thuy Vu | Alessandro Moschitti | Yizhou Sun
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

This research studies graph-based approaches for Answer Sentence Selection (AS2), an essential component for retrieval-based Question Answering (QA) systems. During offline learning, our model constructs a small-scale relevant training graph per question in an unsupervised manner, and integrates with Graph Neural Networks. Graph nodes are question sentence to answer sentence pairs. We train and integrate state-of-the-art (SOTA) models for computing scores between question-question, question-answer, and answer-answer pairs, and use thresholding on relevance scores for creating graph edges. Online inference is then performed to solve the AS2 task on unseen queries. Experiments on two well-known academic benchmarks and a real-world dataset show that our approach consistently outperforms SOTA QA baseline models.

pdf bib abs
Double Retrieval and Ranking for Accurate Question Answering
Zeyu Zhang | Thuy Vu | Alessandro Moschitti
Findings of the Association for Computational Linguistics: EACL 2023

Recent work has shown that an answer verification step introduced in Transformer-based answer selection models can significantly improve the state of the art in Question Answering. This step is performed by aggregating the embeddings of top k answer candidates to support the verification of a target answer. Although the approach is intuitive and sound, it still shows two limitations: (i) the supporting candidates are ranked only according to the relevancy with the question and not with the answer, and (ii) the support provided by the other answer candidates is suboptimal as these are retrieved independently of the target answer. In this paper, we address both drawbacks by proposing (i) a double reranking model, which, for each target answer, selects the best support; and (ii) a second neural retrieval stage designed to encode question and answer pair as the query, which finds more specific verification information. The results on well-known datasets for Answer Sentence Selection show significant improvement over the state of the art.

pdf bib abs
Cross-Lingual Knowledge Distillation for Answer Sentence Selection in Low-Resource Languages
Shivanshu Gupta | Yoshitomo Matsubara | Ankit Chadha | Alessandro Moschitti
Findings of the Association for Computational Linguistics: ACL 2023

While impressive performance has been achieved on the task of Answer Sentence Selection (AS2) for English, the same does not hold for languages that lack large labeled datasets. In this work, we propose Cross-Lingual Knowledge Distillation (CLKD) from a strong English AS2 teacher as a method to train AS2 models for low-resource languages in the tasks without the need of labeled data for the target language. To evaluate our method, we introduce 1) Xtr-WikiQA, a translation-based WikiQA dataset for 9 additional languages, and 2) TyDi-AS2, a multilingual AS2 dataset with over 70K questions spanning 8 typologically diverse languages. We conduct extensive experiments on Xtr-WikiQA and TyDi-AS2 with multiple teachers, diverse monolingual and multilingual pretrained language models (PLMs) as students, and both monolingual and multilingual training. The results demonstrate that CLKD either outperforms or rivals even supervised fine-tuning with the same amount of labeled data and a combination of machine translation and the teacher model. Our method can potentially enable stronger AS2 models for low-resource languages, while TyDi-AS2 can serve as the largest multilingual AS2 dataset for further studies in the research community.

pdf bib abs
QUADRo: Dataset and Models for QUestion-Answer Database Retrieval
Stefano Campese | Ivano Lauriola | Alessandro Moschitti
Findings of the Association for Computational Linguistics: EMNLP 2023

An effective approach to design automated Question Answering (QA) systems is to efficiently retrieve answers from pre-computed databases containing question/answer pairs. One of the main challenges to this design is the lack of training/testing data. Existing resources are limited in size and topics and either do not consider answers (question-question similarity only) or their quality in the annotation process. To fill this gap, we introduce a novel open-domain annotated resource to train and evaluate models for this task. The resource consists of 15,211 input questions. Each question is paired with 30 similar question/answer pairs, resulting in a total of 443,000 annotated examples. The binary label associated with each pair indicates the relevance with respect to the input question. Furthermore, we report extensive experimentation to test the quality and properties of our resource with respect to various key aspects of QA systems, including answer relevance, training strategies, and models input configuration.

pdf bib
SQUARE: Automatic Question Answering Evaluation using Multiple Positive and Negative References
Matteo Gabburo | Siddhant Garg | Rik Koncel-Kedziorski | Alessandro Moschitti
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)

2022

pdf bib abs
Cross-Lingual Open-Domain Question Answering with Answer Sentence Generation
Benjamin Muller | Luca Soldaini | Rik Koncel-Kedziorski | Eric Lind | Alessandro Moschitti
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Open-Domain Generative Question Answering has achieved impressive performance in English by combining document-level retrieval with answer generation. These approaches, which we refer to as GenQA, can generate complete sentences, effectively answering both factoid and non-factoid questions. In this paper, we extend to the multilingual and cross-lingual settings. For this purpose, we first introduce GenTyDiQA, an extension of the TyDiQA dataset with well-formed and complete answers for Arabic, Bengali, English, Japanese, and Russian. Based on GenTyDiQA, we design a cross-lingual generative model that produces full-sentence answers by exploiting passages written in multiple languages, including languages different from the question. Our cross-lingual generative system outperforms answer sentence selection baselines for all 5 languages and monolingual generative pipelines for three out of five languages studied.

pdf bib abs
Knowledge Transfer from Answer Ranking to Answer Generation
Matteo Gabburo | Rik Koncel-Kedziorski | Siddhant Garg | Luca Soldaini | Alessandro Moschitti
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Recent studies show that Question Answering (QA) based on Answer Sentence Selection (AS2) can be improved by generating an improved answer from the top-k ranked answer sentences (termed GenQA). This allows for synthesizing the information from multiple candidates into a concise, natural-sounding answer. However, creating large-scale supervised training data for GenQA models is very challenging. In this paper, we propose to train a GenQA model by transferring knowledge from a trained AS2 model, to overcome the aforementioned issue. First, we use an AS2 model to produce a ranking over answer candidates for a set of questions. Then, we use the top ranked candidate as the generation target, and the next k top ranked candidates as context for training a GenQA model. We also propose to use the AS2 model prediction scores for loss weighting and score-conditioned input/output shaping, to aid the knowledge transfer. Our evaluation on three public and one large industrial datasets demonstrates the superiority of our approach over the AS2 baseline, and GenQA trained using supervised data.

pdf bib abs
Pre-training Transformer Models with Sentence-Level Objectives for Answer Sentence Selection
Luca Di Liello | Siddhant Garg | Luca Soldaini | Alessandro Moschitti
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

An important task for designing QA systems is answer sentence selection (AS2): selecting the sentence containing (or constituting) the answer to a question from a set of retrieved relevant documents. In this paper, we propose three novel sentence-level transformer pre-training objectives that incorporate paragraph-level semantics within and across documents, to improve the performance of transformers for AS2, and mitigate the requirement of large labeled datasets. Specifically, the model is tasked to predict whether: (i) two sentences are extracted from the same paragraph, (ii) a given sentence is extracted from a given paragraph, and (iii) two paragraphs are extracted from the same document. Our experiments on three public and one industrial AS2 datasets demonstrate the empirical superiority of our pre-trained transformers over baseline models such as RoBERTa and ELECTRA for AS2.

We introduce question answering with a cotext in focus, a task that simulates a free interaction with a QA system. The user reads on a screen some information about a topic, and they can follow-up with questions that can be either related or not to the topic; and the answer can be found in the document containing the screen content or from other pages. We call such information context. To study the task, we construct FocusQA, a dataset for answer sentence selection (AS2) with 12,165011unique question/context pairs, and a total of 109,940 answers. To build the dataset, we developed a novel methodology that takes existing questions and pairs them with relevant contexts. To show the benefits of this approach, we present a comparative analysis with a set of questions written by humans after reading the context, showing that our approach greatly helps in eliciting more realistic question/context pairs. Finally, we show that the task poses several challenges for incorporating contextual information. In this respect, we introduce strong baselines for answer sentence selection that outperform the precision of state-of-the-art models for AS2 up to 21.3% absolute points.

pdf bib abs
Effective Pretraining Objectives for Transformer-based Autoencoders
Luca Di Liello | Matteo Gabburo | Alessandro Moschitti
Findings of the Association for Computational Linguistics: EMNLP 2022

In this paper, we study trade-offs between efficiency, cost and accuracy when pre-training Transformer encoders with different pre-training objectives. For this purpose, we analyze features of common objectives and combine them to create new effective pre-training approaches. Specifically, we designed light token generators based on a straightforward statistical approach, which can replace ELECTRA computationally heavy generators, thus highly reducing cost. Our experiments also show that (i) there are more efficient alternatives to BERT’s MLM, and (ii) it is possible to efficiently pre-train Transformer-based models using lighter generators without a significant drop in performance.

pdf bib abs
Ensemble Transformer for Efficient and Accurate Ranking Tasks: an Application to Question Answering Systems
Yoshitomo Matsubara | Luca Soldaini | Eric Lind | Alessandro Moschitti
Findings of the Association for Computational Linguistics: EMNLP 2022

Large transformer models can highly improve Answer Sentence Selection (AS2) tasks, but their high computational costs prevent their use in many real-world applications. In this paper, we explore the following research question: How can we make the AS2 models more accurate without significantly increasing their model complexity? To address the question, we propose a Multiple Heads Student architecture (named CERBERUS), an efficient neural network designed to distill an ensemble of large transformers into a single smaller model. CERBERUS consists of two components: a stack of transformer layers that is used to encode inputs, and a set of ranking heads; unlike traditional distillation technique, each of them is trained by distilling a different large transformer architecture in a way that preserves the diversity of the ensemble members. The resulting model captures the knowledge of heterogeneous transformer models by using just a few extra parameters. We show the effectiveness of CERBERUS on three English datasets for AS2; our proposed approach outperforms all single-model distillations we consider, rivaling the state-of-the-art large AS2 models that have 2.7× more parameters and run 2.5× slower. Code for our model is available at https://github.com/amazon-research/wqa-cerberus.

pdf bib abs
Building a Dataset for Automatically Learning to Detect Questions Requiring Clarification
Ivano Lauriola | Kevin Small | Alessandro Moschitti
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Question Answering (QA) systems aim to return correct and concise answers in response to user questions. QA research generally assumes all questions are intelligible and unambiguous, which is unrealistic in practice as questions frequently encountered by virtual assistants are ambiguous or noisy. In this work, we propose to make QA systems more robust via the following two-step process: (1) classify if the input question is intelligible and (2) for such questions with contextual ambiguity, return a clarification question. We describe a new open-domain clarification corpus containing user questions sampled from Quora, which is useful for building machine learning approaches to solving these tasks.

pdf bib abs
Paragraph-based Transformer Pre-training for Multi-Sentence Inference
Luca Di Liello | Siddhant Garg | Luca Soldaini | Alessandro Moschitti
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Inference tasks such as answer sentence selection (AS2) or fact verification are typically solved by fine-tuning transformer-based models as individual sentence-pair classifiers. Recent studies show that these tasks benefit from modeling dependencies across multiple candidate sentences jointly. In this paper, we first show that popular pre-trained transformers perform poorly when used for fine-tuning on multi-candidate inference tasks. We then propose a new pre-training objective that models the paragraph-level semantics across multiple input sentences. Our evaluation on three AS2 and one fact verification datasets demonstrates the superiority of our pre-training technique over the traditional ones for transformers used as joint models for multi-candidate inference tasks, as well as when used as cross-encoders for sentence-pair formulations of these tasks.

2021

pdf bib abs
Joint Models for Answer Verification in Question Answering Systems
Zeyu Zhang | Thuy Vu | Alessandro Moschitti
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

This paper studies joint models for selecting correct answer sentences among the top k provided by answer sentence selection (AS2) modules, which are core components of retrieval-based Question Answering (QA) systems. Our work shows that a critical step to effectively exploiting an answer set regards modeling the interrelated information between pair of answers. For this purpose, we build a three-way multi-classifier, which decides if an answer supports, refutes, or is neutral with respect to another one. More specifically, our neural architecture integrates a state-of-the-art AS2 module with the multi-classifier, and a joint layer connecting all components. We tested our models on WikiQA, TREC-QA, and a real-world dataset. The results show that our models obtain the new state of the art in AS2.

pdf bib abs
WikiSum: Coherent Summarization Dataset for Efficient Human-Evaluation
Nachshon Cohen | Oren Kalinsky | Yftah Ziser | Alessandro Moschitti
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Recent works made significant advances on summarization tasks, facilitated by summarization datasets. Several existing datasets have the form of coherent-paragraph summaries. However, these datasets were curated from academic documents that were written for experts, thus making the essential step of assessing the summarization output through human-evaluation very demanding. To overcome these limitations, we present a dataset based on article summaries appearing on the WikiHow website, composed of how-to articles and coherent-paragraph summaries written in plain language. We compare our dataset attributes to existing ones, including readability and world-knowledge, showing our dataset makes human evaluation significantly easier and thus, more effective. A human evaluation conducted on PubMed and the proposed dataset reinforces our findings.

pdf bib
Language Transfer for Identifying Diagnostic Paragraphs in Clinical Notes
Luca Di Liello | Olga Uryupina | Alessandro Moschitti
Proceedings of the Eighth Italian Conference on Computational Linguistics (CLiC-it 2021)

pdf bib abs
Modeling Context in Answer Sentence Selection Systems on a Latency Budget
Rujun Han | Luca Soldaini | Alessandro Moschitti
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Answer Sentence Selection (AS2) is an efficient approach for the design of open-domain Question Answering (QA) systems. In order to achieve low latency, traditional AS2 models score question-answer pairs individually, ignoring any information from the document each potential answer was extracted from. In contrast, more computationally expensive models designed for machine reading comprehension tasks typically receive one or more passages as input, which often results in better accuracy. In this work, we present an approach to efficiently incorporate contextual information in AS2 models. For each answer candidate, we first use unsupervised similarity techniques to extract relevant sentences from its source document, which we then feed into an efficient transformer architecture fine-tuned for AS2. Our best approach, which leverages a multi-way attention architecture to efficiently encode context, improves 6% to 11% over non-contextual state of the art in AS2 with minimal impact on system latency. All experiments in this work were conducted in English.

pdf bib abs
CDA: a Cost Efficient Content-based Multilingual Web Document Aligner
Thuy Vu | Alessandro Moschitti
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

We introduce a Content-based Document Alignment approach (CDA), an efficient method to align multilingual web documents based on content in creating parallel training data for machine translation (MT) systems operating at the industrial level. CDA works in two steps: (i) projecting documents of a web domain to a shared multilingual space; then (ii) aligning them based on the similarity of their representations in such space. We leverage lexical translation models to build vector representations using TF×IDF. CDA achieves performance comparable with state-of-the-art systems in the WMT-16 Bilingual Document Alignment Shared Task benchmark while operating in multilingual space. Besides, we created two web-scale datasets to examine the robustness of CDA in an industrial setting involving up to 28 languages and millions of documents. The experiments show that CDA is robust, cost-effective, and is significantly superior in (i) processing large and noisy web data and (ii) scaling to new and low-resourced languages.

pdf bib abs
Will this Question be Answered? Question Filtering via Answer Model Distillation for Efficient Question Answering
Siddhant Garg | Alessandro Moschitti
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

In this paper we propose a novel approach towards improving the efficiency of Question Answering (QA) systems by filtering out questions that will not be answered by them. This is based on an interesting new finding: the answer confidence scores of state-of-the-art QA systems can be approximated well by models solely using the input question text. This enables preemptive filtering of questions that are not answered by the system due to their answer confidence scores being lower than the system threshold. Specifically, we learn Transformer-based question models by distilling Transformer-based answering models. Our experiments on three popular QA datasets and one industrial QA benchmark demonstrate the ability of our question models to approximate the Precision/Recall curves of the target QA system well. These question models, when used as filters, can effectively trade off lower computation cost of QA systems for lower Recall, e.g., reducing computation by ~60%, while only losing ~3-4% of Recall.

pdf bib
Answer Generation for Retrieval-based Question Answering Systems
Chao-Chun Hsu | Eric Lind | Luca Soldaini | Alessandro Moschitti
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Strong and Light Baseline Models for Fact-Checking Joint Inference
Kateryna Tymoshenko | Alessandro Moschitti
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib abs
Reference-based Weak Supervision for Answer Sentence Selection using Web Data
Vivek Krishnamurthy | Thuy Vu | Alessandro Moschitti
Findings of the Association for Computational Linguistics: EMNLP 2021

Answer Sentence Selection (AS2) models are core components of efficient retrieval-based Question Answering (QA) systems. We present the Reference-based Weak Supervision (RWS), a fully automatic large-scale data pipeline that harvests high-quality weakly- supervised answer sentences from Web data, only requiring a question-reference pair as input. We evaluated the quality of the RWS-derived data by training TANDA models, which are the state of the art for AS2. Our results show that the data consistently bolsters TANDA on three different datasets. In particular, we set the new state of the art for AS2 to P@1=90.1%, and MAP=92.9%, on WikiQA. We record similar performance gains of RWS on a much larger dataset named Web-based Question Answering (WQA).

pdf bib abs
Supervised Neural Clustering via Latent Structured Output Learning: Application to Question Intents
Iryna Haponchyk | Alessandro Moschitti
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Previous pre-neural work on structured prediction has produced very effective supervised clustering algorithms using linear classifiers, e.g., structured SVM or perceptron. However, these cannot exploit the representation learning ability of neural networks, which would make supervised clustering even more powerful, i.e., general clustering patterns can be learned automatically. In this paper, we design neural networks based on latent structured prediction loss and Transformer models to approach supervised clustering. We tested our methods on the task of automatically recreating categories of intents from publicly available question intent corpora. The results show that our approach delivers 95.65% of F1, outperforming the state of the art by 17.24%.

pdf bib abs
AVA: an Automatic eValuation Approach for Question Answering Systems
Thuy Vu | Alessandro Moschitti
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

We introduce AVA, an automatic evaluation approach for Question Answering, which given a set of questions associated with Gold Standard answers (references), can estimate system Accuracy. AVA uses Transformer-based language models to encode question, answer, and reference texts. This allows for effectively assessing answer correctness using similarity between the reference and an automatic answer, biased towards the question semantics. To design, train, and test AVA, we built multiple large training, development, and test sets on public and industrial benchmarks. Our innovative solutions achieve up to 74.7% F1 score in predicting human judgment for single answers. Additionally, AVA can be used to evaluate the overall system Accuracy with an error lower than 7% at 95% of confidence when measured on several QA systems.

2020

pdf bib abs
The Cascade Transformer: an Application for Efficient Answer Sentence Selection
Luca Soldaini | Alessandro Moschitti
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Large transformer-based language models have been shown to be very effective in many classification tasks. However, their computational complexity prevents their use in applications requiring the classification of a large set of candidates. While previous works have investigated approaches to reduce model size, relatively little attention has been paid to techniques to improve batch throughput during inference. In this paper, we introduce the Cascade Transformer, a simple yet effective technique to adapt transformer-based models into a cascade of rankers. Each ranker is used to prune a subset of candidates in a batch, thus dramatically increasing throughput at inference time. Partial encodings from the transformer model are shared among rerankers, providing further speed-up. When compared to a state-of-the-art transformer model, our approach reduces computation by 37% with almost no impact on accuracy, as measured on two English Question Answering datasets.

pdf bib
Cross-Language Transformer Adaptation for Frequently Asked Questions
Luca Di Liello | Daniele Bonadiman | Alessandro Moschitti | Cristina Giannone | Andrea Favalli | Raniero Romagnoli
Proceedings of the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020)

pdf bib
Dialog-based Help Desk through Automated Question Answering and Intent Detection
Antonio Uva | Pierluigi Roberti | Alessandro Moschitti
Proceedings of the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020)

pdf bib abs
A Study on Efficiency, Accuracy and Document Structure for Answer Sentence Selection
Daniele Bonadiman | Alessandro Moschitti
Proceedings of the 28th International Conference on Computational Linguistics

An essential task of most Question Answering (QA) systems is to re-rank the set of answer candidates, i.e., Answer Sentence Selection (AS2). These candidates are typically sentences either extracted from one or more documents preserving their natural order or retrieved by a search engine. Most state-of-the-art approaches to the task use huge neural models, such as BERT, or complex attentive architectures. In this paper, we argue that by exploiting the intrinsic structure of the original rank together with an effective word-relatedness encoder, we achieve the highest accuracy among the cost-efficient models, with two orders of magnitude fewer parameters than the current state of the art. Our model takes 9.5 seconds to train on the WikiQA dataset, i.e., very fast in comparison with the 18 minutes required by a standard BERT-base fine-tuning.

pdf bib
Proceedings of Knowledgeable NLP: the First Workshop on Integrating Structured Knowledge and Neural Networks for NLP
Oren Sar Shalom | Alexander Panchenko | Cicero dos Santos | Varvara Logacheva | Alessandro Moschitti | Ido Dagan
Proceedings of Knowledgeable NLP: the First Workshop on Integrating Structured Knowledge and Neural Networks for NLP

2019

pdf bib abs
A Study of Latent Structured Prediction Approaches to Passage Reranking
Iryna Haponchyk | Alessandro Moschitti
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

The structured output framework provides a helpful tool for learning to rank problems. In this paper, we propose a structured output approach which regards rankings as latent variables. Our approach addresses the complex optimization of Mean Average Precision (MAP) ranking metric. We provide an inference procedure to find the max-violating ranking based on the decomposition of the corresponding loss. The results of our experiments on WikiQA and TREC13 datasets show that our reranking based on structured prediction is a promising research direction.

2018

pdf bib abs
Learning to Progressively Recognize New Named Entities with Sequence to Sequence Models
Lingzhen Chen | Alessandro Moschitti
Proceedings of the 27th International Conference on Computational Linguistics

In this paper, we propose to use a sequence to sequence model for Named Entity Recognition (NER) and we explore the effectiveness of such model in a progressive NER setting – a Transfer Learning (TL) setting. We train an initial model on source data and transfer it to a model that can recognize new NE categories in the target data during a subsequent step, when the source data is no longer available. Our solution consists in: (i) to reshape and re-parametrize the output layer of the first learned model to enable the recognition of new NEs; (ii) to leave the rest of the architecture unchanged, such that it is initialized with parameters transferred from the initial model; and (iii) to fine tune the network on the target data. Most importantly, we design a new NER approach based on sequence to sequence (Seq2Seq) models, which can intuitively work better in our progressive setting. We compare our approach with a Bidirectional LSTM, which is a strong neural NER model. Our experiments show that the Seq2Seq model performs very well on the standard NER setting and it is more robust in the progressive setting. Our approach can recognize previously unseen NE categories while preserving the knowledge of the seen data.

pdf bib abs
Adversarial Domain Adaptation for Duplicate Question Detection
Darsh Shah | Tao Lei | Alessandro Moschitti | Salvatore Romeo | Preslav Nakov
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We address the problem of detecting duplicate questions in forums, which is an important step towards automating the process of answering new questions. As finding and annotating such potential duplicates manually is very tedious and costly, automatic methods based on machine learning are a viable alternative. However, many forums do not have annotated data, i.e., questions labeled by experts as duplicates, and thus a promising solution is to use domain adaptation from another forum that has such annotations. Here we focus on adversarial domain adaptation, deriving important findings about when it performs well and what properties of the domains are important in this regard. Our experiments with StackExchange data show an average improvement of 5.6% over the best baseline across multiple pairs of domains.

pdf bib abs
Semantic Linking in Convolutional Neural Networks for Answer Sentence Selection
Massimo Nicosia | Alessandro Moschitti
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

State-of-the-art networks that model relations between two pieces of text often use complex architectures and attention. In this paper, instead of focusing on architecture engineering, we take advantage of small amounts of labelled data that model semantic phenomena in text to encode matching features directly in the word representations. This greatly boosts the accuracy of our reference network, while keeping the model simple and fast to train. Our approach also beats a tree kernel model that uses similar input encodings, and neural models which use advanced attention and compare-aggregate mechanisms.

pdf bib abs
Cross-Pair Text Representations for Answer Sentence Selection
Kateryna Tymoshenko | Alessandro Moschitti
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

High-level semantics tasks, e.g., paraphrasing, textual entailment or question answering, involve modeling of text pairs. Before the emergence of neural networks, this has been mostly performed using intra-pair features, which incorporate similarity scores or rewrite rules computed between the members within the same pair. In this paper, we compute scalar products between vectors representing similarity between members of different pairs, in place of simply using a single vector for each pair. This allows us to obtain a representation specific to any pair of pairs, which delivers the state of the art in answer sentence selection. Most importantly, our approach can outperform much more complex algorithms based on neural networks.

pdf bib abs
Supervised Clustering of Questions into Intents for Dialog System Applications
Iryna Haponchyk | Antonio Uva | Seunghak Yu | Olga Uryupina | Alessandro Moschitti
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Modern automated dialog systems require complex dialog managers able to deal with user intent triggered by high-level semantic questions. In this paper, we propose a model for automatically clustering questions into user intents to help the design tasks. Since questions are short texts, uncovering their semantics to group them together can be very challenging. We approach the problem by using powerful semantic classifiers from question duplicate/matching research along with a novel idea of supervised clustering methods based on structured output. We test our approach on two intent clustering corpora, showing an impressive improvement over previous methods for two languages/domains.

pdf bib abs
Automatic Stance Detection Using End-to-End Memory Networks
Mitra Mohtarami | Ramy Baly | James Glass | Preslav Nakov | Lluís Màrquez | Alessandro Moschitti
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

We present an effective end-to-end memory network model that jointly (i) predicts whether a given document can be considered as relevant evidence for a given claim, and (ii) extracts snippets of evidence that can be used to reason about the factuality of the target claim. Our model combines the advantages of convolutional and recurrent neural networks as part of a memory network. We further introduce a similarity matrix at the inference level of the memory network in order to extract snippets of evidence for input claims more accurately. Our experiments on a public benchmark dataset, FakeNewsChallenge, demonstrate the effectiveness of our approach.

pdf bib abs
Integrating Stance Detection and Fact Checking in a Unified Corpus
Ramy Baly | Mitra Mohtarami | James Glass | Lluís Màrquez | Alessandro Moschitti | Preslav Nakov
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

A reasonable approach for fact checking a claim involves retrieving potentially relevant documents from different sources (e.g., news websites, social media, etc.), determining the stance of each document with respect to the claim, and finally making a prediction about the claim’s factuality by aggregating the strength of the stances, while taking the reliability of the source into account. Moreover, a fact checking system should be able to explain its decision by providing relevant extracts (rationales) from the documents. Yet, this setup is not directly supported by existing datasets, which treat fact checking, document retrieval, source credibility, stance detection and rationale extraction as independent tasks. In this paper, we support the interdependencies between these tasks as annotations in the same corpus. We implement this setup on an Arabic fact checking corpus, the first of its kind.

pdf bib abs
Injecting Relational Structural Representation in Neural Networks for Question Similarity
Antonio Uva | Daniele Bonadiman | Alessandro Moschitti
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Effectively using full syntactic parsing information in Neural Networks (NNs) for solving relational tasks, e.g., question similarity, is still an open problem. In this paper, we propose to inject structural representations in NNs by (i) learning a model with Tree Kernels (TKs) on relatively few pairs of questions (few thousands) as gold standard (GS) training data is typically scarce, (ii) predicting labels on a very large corpus of question pairs, and (iii) pre-training NNs on such large corpus. The results on Quora and SemEval question similarity datasets show that NNs using our approach can learn more accurate models, especially after fine tuning on GS.

pdf bib abs
A Flexible, Efficient and Accurate Framework for Community Question Answering Pipelines
Salvatore Romeo | Giovanni Da San Martino | Alberto Barrón-Cedeño | Alessandro Moschitti
Proceedings of ACL 2018, System Demonstrations

Although deep neural networks have been proving to be excellent tools to deliver state-of-the-art results, when data is scarce and the tackled tasks involve complex semantic inference, deep linguistic processing and traditional structure-based approaches, such as tree kernel methods, are an alternative solution. Community Question Answering is a research area that benefits from deep linguistic analysis to improve the experience of the community of forum users. In this paper, we present a UIMA framework to distribute the computation of cQA tasks over computer clusters such that traditional systems can scale to large datasets and deliver fast processing.

2017

pdf bib
Predicting Land Use of Italian Cities using Structural Semantic Models
Gianni Barlacchi | Bruno Lepri | Alessandro Moschitti
Proceedings of the Fourth Italian Conference on Computational Linguistics (CLiC-it 2017)

pdf bib
Neural Sentiment Analysis for a Real-World Application
Daniele Bonadiman | Giuseppe Castellucci | Andrea Favalli | Raniero Romagnoli | Alessandro Moschitti
Proceedings of the Fourth Italian Conference on Computational Linguistics (CLiC-it 2017)

pdf bib
Commercial Applications through Community Question Answering Technology
Antonio Uva | Valerio Storch | Casimiro Carrino | Ugo Di Iorio | Alessandro Moschitti
Proceedings of the Fourth Italian Conference on Computational Linguistics (CLiC-it 2017)

pdf bib abs
Ranking Kernels for Structures and Embeddings: A Hybrid Preference and Classification Model
Kateryna Tymoshenko | Daniele Bonadiman | Alessandro Moschitti
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Recent work has shown that Tree Kernels (TKs) and Convolutional Neural Networks (CNNs) obtain the state of the art in answer sentence reranking. Additionally, their combination used in Support Vector Machines (SVMs) is promising as it can exploit both the syntactic patterns captured by TKs and the embeddings learned by CNNs. However, the embeddings are constructed according to a classification function, which is not directly exploitable in the preference ranking algorithm of SVMs. In this work, we propose a new hybrid approach combining preference ranking applied to TKs and pointwise ranking applied to CNNs. We show that our approach produces better results on two well-known and rather different datasets: WikiQA for answer sentence selection and SemEval cQA for comment selection in Community Question Answering.

pdf bib abs
A Practical Perspective on Latent Structured Prediction for Coreference Resolution
Iryna Haponchyk | Alessandro Moschitti
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

Latent structured prediction theory proposes powerful methods such as Latent Structural SVM (LSSVM), which can potentially be very appealing for coreference resolution (CR). In contrast, only small work is available, mainly targeting the latent structured perceptron (LSP). In this paper, we carried out a practical study comparing for the first time online learning with LSSVM. We analyze the intricacies that may have made initial attempts to use LSSVM fail, i.e., a huge training time and much lower accuracy produced by Kruskal’s spanning tree algorithm. In this respect, we also propose a new effective feature selection approach for improving system efficiency. The results show that LSP, if correctly parameterized, produces the same performance as LSSVM, being much more efficient.

pdf bib abs
Effective shared representations with Multitask Learning for Community Question Answering
Daniele Bonadiman | Antonio Uva | Alessandro Moschitti
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

An important asset of using Deep Neural Networks (DNNs) for text applications is their ability to automatically engineering features. Unfortunately, DNNs usually require a lot of training data, especially for highly semantic tasks such as community Question Answering (cQA). In this paper, we tackle the problem of data scarcity by learning the target DNN together with two auxiliary tasks in a multitask learning setting. We exploit the strong semantic connection between selection of comments relevant to (i) new questions and (ii) forum questions. This enables a global representation for comments, new and previous questions. The experiments of our model on a SemEval challenge dataset for cQA show a 20% of relative improvement over standard DNNs.

pdf bib abs
Collaborative Partitioning for Coreference Resolution
Olga Uryupina | Alessandro Moschitti
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

This paper presents a collaborative partitioning algorithm—a novel ensemble-based approach to coreference resolution. Starting from the all-singleton partition, we search for a solution close to the ensemble’s outputs in terms of a task-specific similarity measure. Our approach assumes a loose integration of individual components of the ensemble and can therefore combine arbitrary coreference resolvers, regardless of their models. Our experiments on the CoNLL dataset show that collaborative partitioning yields results superior to those attained by the individual components, for ensembles of both strong and weak systems. Moreover, by applying the collaborative partitioning algorithm on top of three state-of-the-art resolvers, we obtain the best coreference performance reported so far in the literature (MELA v08 score of 64.47).

pdf bib abs
Learning Contextual Embeddings for Structural Semantic Similarity using Categorical Information
Massimo Nicosia | Alessandro Moschitti
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

Tree kernels (TKs) and neural networks are two effective approaches for automatic feature engineering. In this paper, we combine them by modeling context word similarity in semantic TKs. This way, the latter can operate subtree matching by applying neural-based similarity on tree lexical nodes. We study how to learn representations for the words in context such that TKs can exploit more focused information. We found that neural embeddings produced by current methods do not provide a suitable contextual similarity. Thus, we define a new approach based on a Siamese Network, which produces word representations while learning a binary text similarity. We set the latter considering examples in the same category as similar. The experiments on question and sentiment classification show that our semantic TK highly improves previous results.

pdf bib abs
Don’t understand a measure? Learn it: Structured Prediction for Coreference Resolution optimizing its measures
Iryna Haponchyk | Alessandro Moschitti
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

An interesting aspect of structured prediction is the evaluation of an output structure against the gold standard. Especially in the loss-augmented setting, the need of finding the max-violating constraint has severely limited the expressivity of effective loss functions. In this paper, we trade off exact computation for enabling the use and study of more complex loss functions for coreference resolution. Most interestingly, we show that such functions can be (i) automatically learned also from controversial but commonly accepted coreference measures, e.g., MELA, and (ii) successfully used in learning algorithms. The accurate model comparison on the standard CoNLL-2012 setting shows the benefit of more expressive loss functions.

pdf bib abs
Self-Crowdsourcing Training for Relation Extraction
Azad Abad | Moin Nabi | Alessandro Moschitti
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

In this paper we introduce a self-training strategy for crowdsourcing. The training examples are automatically selected to train the crowd workers. Our experimental results show an impact of 5% Improvement in terms of F1 for relation extraction task, compared to the method based on distant supervision.

pdf bib
RelTextRank: An Open Source Framework for Building Relational Syntactic-Semantic Text Pair Representations
Kateryna Tymoshenko | Alessandro Moschitti | Massimo Nicosia | Aliaksei Severyn
Proceedings of ACL 2017, System Demonstrations

pdf bib abs
Building Chatbots from Forum Data: Model Selection Using Question Answering Metrics
Martin Boyanov | Preslav Nakov | Alessandro Moschitti | Giovanni Da San Martino | Ivan Koychev
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

We propose to use question answering (QA) data from Web forums to train chat-bots from scratch, i.e., without dialog data. First, we extract pairs of question and answer sentences from the typically much longer texts of questions and answers in a forum. We then use these shorter texts to train seq2seq models in a more efficient way. We further improve the parameter optimization using a new model selection strategy based on QA measures. Finally, we propose to use extrinsic evaluation with respect to a QA task as an automatic evaluation method for chatbot systems. The evaluation shows that the model achieves a MAP of 63.5% on the extrinsic task. Moreover, our manual evaluation demonstrates that the model can answer correctly 49.5% of the questions when they are similar in style to how questions are asked in the forum, and 47.3% of the questions, when they are more conversational in style.

We describe SemEval–2017 Task 3 on Community Question Answering. This year, we reran the four subtasks from SemEval-2016: (A) Question–Comment Similarity, (B) Question–Question Similarity, (C) Question–External Comment Similarity, and (D) Rerank the correct answers for a new question in Arabic, providing all the data from 2015 and 2016 for training, and fresh data for testing. Additionally, we added a new subtask E in order to enable experimentation with Multi-domain Question Duplicate Detection in a larger-scale scenario, using StackExchange subforums. A total of 23 teams participated in the task, and submitted a total of 85 runs (36 primary and 49 contrastive) for subtasks A–D. Unfortunately, no teams participated in subtask E. A variety of approaches and features were used by the participating systems to address the different subtasks. The best systems achieved an official score (MAP) of 88.43, 47.22, 15.46, and 61.16 in subtasks A, B, C, and D, respectively. These scores are better than the baselines, especially for subtasks A–C.

pdf bib abs
KeLP at SemEval-2017 Task 3: Learning Pairwise Patterns in Community Question Answering
Simone Filice | Giovanni Da San Martino | Alessandro Moschitti
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper describes the KeLP system participating in the SemEval-2017 community Question Answering (cQA) task. The system is a refinement of the kernel-based sentence pair modeling we proposed for the previous year challenge. It is implemented within the Kernel-based Learning Platform called KeLP, from which we inherit the team’s name. Our primary submission ranked first in subtask A, and third in subtasks B and C, being the only systems appearing in the top-3 ranking for all the English subtasks. This shows that the proposed framework, which has minor variations among the three subtasks, is extremely flexible and effective in tackling learning tasks defined on sentence pairs.

2016

In real-world data, e.g., from Web forums, text is often contaminated with redundant or irrelevant content, which leads to introducing noise in machine learning algorithms. In this paper, we apply Long Short-Term Memory networks with an attention mechanism, which can select important parts of text for the task of similar question retrieval from community Question Answering (cQA) forums. In particular, we use the attention weights for both selecting entire sentences and their subparts, i.e., word/chunk, from shallow syntactic trees. More interestingly, we apply tree kernels to the filtered text representations, thus exploiting the implicit features of the subtree space for learning question reranking. Our results show that the attention-based pruning allows for achieving the top position in the cQA challenge of SemEval 2016, with a relatively large gap from the other participants while greatly decreasing running time.

pdf bib abs
Selecting Sentences versus Selecting Tree Constituents for Automatic Question Ranking
Alberto Barrón-Cedeño | Giovanni Da San Martino | Salvatore Romeo | Alessandro Moschitti
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Community question answering (cQA) websites are focused on users who query questions onto an online forum, expecting for other users to provide them answers or suggestions. Unlike other social media, the length of the posted queries has no limits and queries tend to be multi-sentence elaborations combining context, actual questions, and irrelevant information. We approach the problem of question ranking: given a user’s new question, to retrieve those previously-posted questions which could be equivalent, or highly relevant. This could prevent the posting of nearly-duplicate questions and provide the user with instantaneous answers. For the first time in cQA, we address the selection of relevant text —both at sentence- and at constituent-level— for parse tree-based representations. Our supervised models for text selection boost the performance of a tree kernel-based machine learning model, allowing it to overtake the current state of the art on a recently released cQA evaluation framework.

We present an interactive system to provide effective and efficient search capabilities in Community Question Answering (cQA) forums. The system integrates state-of-the-art technology for answer search with a Web-based user interface specifically tailored to support the cQA forum readers. The answer search module automatically finds relevant answers for a new question by exploring related questions and the comments within their threads. The graphical user interface presents the search results and supports the exploration of related information. The system is running live at http://www.qatarliving.com/betasearch/.

pdf bib
Learning to Recognize Ancillary Information for Automatic Paraphrase Identification
Simone Filice | Alessandro Moschitti
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Convolutional Neural Networks vs. Convolution Kernels: Feature Engineering for Answer Sentence Reranking
Kateryna Tymoshenko | Daniele Bonadiman | Alessandro Moschitti
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Semi-supervised Question Retrieval with Gated Convolutions
Tao Lei | Hrishikesh Joshi | Regina Barzilay | Tommi Jaakkola | Kateryna Tymoshenko | Alessandro Moschitti | Lluís Màrquez
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
KeLP at SemEval-2016 Task 3: Learning Semantic Relations between Questions and Answers
Simone Filice | Danilo Croce | Alessandro Moschitti | Roberto Basili
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
Taking the best from the Crowd:Learning Question Passage Classification from Noisy Data
Azad Abad | Alessandro Moschitti
Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics

2015

pdf bib
Global Thread-level Inference for Comment Classification in Community Question Answering
Shafiq Joty | Alberto Barrón-Cedeño | Giovanni Da San Martino | Simone Filice | Lluís Màrquez | Alessandro Moschitti | Preslav Nakov
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
High-Order Low-Rank Tensors for Semantic Role Labeling
Tao Lei | Yuan Zhang | Lluís Màrquez | Alessandro Moschitti | Regina Barzilay
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
On the Automatic Learning of Sentiment Lexicons
Aliaksei Severyn | Alessandro Moschitti
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Structural Representations for Learning Relations between Pairs of Texts
Simone Filice | Giovanni Da San Martino | Alessandro Moschitti
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Distributional Neural Networks for Automatic Resolution of Crossword Puzzles
Aliaksei Severyn | Massimo Nicosia | Gianni Barlacchi | Alessandro Moschitti
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
Thread-Level Information for Comment Classification in Community Question Answering
Alberto Barrón-Cedeño | Simone Filice | Giovanni Da San Martino | Shafiq Joty | Lluís Màrquez | Preslav Nakov | Alessandro Moschitti
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
SACRY: Syntax-based Automatic Crossword puzzle Resolution sYstem
Alessandro Moschitti | Massimo Nicosia | Gianni Barlacchi
Proceedings of ACL-IJCNLP 2015 System Demonstrations

pdf bib
A State-of-the-Art Mention-Pair Model for Coreference Resolution
Olga Uryupina | Alessandro Moschitti
Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics

pdf bib
SemEval-2015 Task 3: Answer Selection in Community Question Answering
Preslav Nakov | Lluís Màrquez | Walid Magdy | Alessandro Moschitti | Jim Glass | Bilal Randeree
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
UNITN: Training Deep Convolutional Neural Network for Twitter Sentiment Classification
Aliaksei Severyn | Alessandro Moschitti
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2014

pdf bib
A Study of using Syntactic and Semantic Structures for Concept Segmentation and Labeling
Iman Saleh | Scott Cyphers | Jim Glass | Shafiq Joty | Lluís Màrquez | Alessandro Moschitti | Preslav Nakov
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Alessandro Moschitti | Bo Pang | Walter Daelemans
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Learning to Differentiate Better from Worse Translations
Francisco Guzmán | Shafiq Joty | Lluís Màrquez | Alessandro Moschitti | Preslav Nakov | Massimo Nicosia
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Semantic Kernels for Semantic Parsing
Iman Saleh | Alessandro Moschitti | Preslav Nakov | Lluís Màrquez | Shafiq Joty
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Discriminative Reranking of Discourse Parses Using Tree Kernels
Shafiq Joty | Alessandro Moschitti
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Encoding Semantic Resources in Syntactic Structures for Passage Reranking
Kateryna Tymoshenko | Alessandro Moschitti | Aliaksei Severyn
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib abs
SenTube: A Corpus for Sentiment Analysis on YouTube Social Media
Olga Uryupina | Barbara Plank | Aliaksei Severyn | Agata Rotondi | Alessandro Moschitti
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper we present SenTube – a dataset of user-generated comments on YouTube videos annotated for information content and sentiment polarity. It contains annotations that allow to develop classifiers for several important NLP tasks: (i) sentiment analysis, (ii) text categorization (relatedness of a comment to video and/or product), (iii) spam detection, and (iv) prediction of comment informativeness. The SenTube corpus favors the development of research on indexing and searching YouTube videos exploiting information derived from comments. The corpus will cover several languages: at the moment, we focus on English and Italian, with Spanish and Dutch parts scheduled for the later stages of the project. For all the languages, we collect videos for the same set of products, thus offering possibilities for multi- and cross-lingual experiments. The paper provides annotation guidelines, corpus statistics and annotator agreement details.

pdf bib
Opinion Mining on YouTube
Aliaksei Severyn | Alessandro Moschitti | Olga Uryupina | Barbara Plank | Katja Filippova
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Learning to Rank Answer Candidates for Automatic Resolution of Crossword Puzzles
Gianni Barlacchi | Massimo Nicosia | Alessandro Moschitti
Proceedings of the Eighteenth Conference on Computational Natural Language Learning

This paper describes a new flexible representation for the annotation of complex structures of metadata over heterogeneous data collections containing text and other types of media such as images or audio files. We argue that existing frameworks are not suitable for this purpose, most importantly because they do not easily generalize to multi-document and multimodal corpora, and because they often require the use of particular software frameworks. In the paper, we define a data model to represent such structured data over multimodal collections. Furthermore, we define a surface realization of the data structure as a simple and readable XML format. We present two examples of annotation tasks to illustrate how the representation and format work for complex structures involving multimodal annotation and cross-document links. The representation described here has been used in a large-scale project focusing on the annotation of a wide range of information ― from low-level features to high-level semantics ― in a multimodal data collection containing both text and images.

pdf bib abs
A Comprehensive Resource to Evaluate Complex Open Domain Question Answering
Silvia Quarteroni | Alessandro Moschitti
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Complex Question Answering is a discipline that involves a deep understanding of question/answer relations, such as those characterizing definition and procedural questions and their answers. To contribute to the improvement of this technology, we deliver two question and answer corpora for complex questions, WEB-QA and TREC-QA, extracted by the same Question Answering system, YourQA, from the Web and from the AQUAINT-6 data collection respectively. We believe that such corpora can be useful resources to address a type of QA that is far from being efficiently solved. WEB-QA and TREC-QA are available in two formats: judgment files and training/testing files. Judgment files contain a ranked list of candidate answers to TREC-10 complex questions, extracted using YourQA as a baseline system and manually labelled according to a Likert scale from 1 (completely incorrect) to 5 (totally correct). Training and testing files contain learning instances compatible with SVM-light; these are useful for experimenting with shallow and complex structural features such as parse trees and semantic role labels. Our experiments with the above corpora have allowed to prove that structured information representation is useful to improve the accuracy of complex QA systems and to re-rank answers.

pdf bib abs
Corpora for Automatically Learning to Map Natural Language Questions into SQL Queries
Alessandra Giordani | Alessandro Moschitti
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Automatically translating natural language into machine-readable instructions is one of major interesting and challenging tasks in Natural Language (NL) Processing. This problem can be addressed by using machine learning algorithms to generate a function that find mappings between natural language and programming language semantics. For this purpose suitable annotated and structured data are required. In this paper, we describe our method to construct and semi-automatically annotate these kinds of data, consisting of pairs of NL questions and SQL queries. Additionally, we describe two different datasets obtained by applying our annotation method to two well-known corpora, GeoQueries and RestQueries. Since we believe that syntactic levels are important, we also generate and make available relational pairs represented by means of their syntactic trees whose lexical content has been generalized. We validate the quality of our corpora by experimenting with them and our machine learning models to derive automatic NL/SQL translators. Our promising results suggest that our corpora can be effectively used to carry out research in the field of natural language interface to database.

pdf bib abs
A General Purpose FrameNet-based Shallow Semantic Parser
Bonaventura Coppola | Alessandro Moschitti
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper we present a new FrameNet-based Shallow Semantic Parser. Shallow Semantic Parsing has been a popular Natural Language Processing task since the 2004 and 2005 CoNLL Shared Task editions on Semantic Role Labeling, which were based on the PropBank lexical-semantic resource. Nonetheless, efforts in extending such task to the FrameNet setting have been constrained by practical software engineering issues. We hereby analyze these issues, identify desirable requirements for a practical parsing framework, and show the results of our software implementation. In particular, we attempt at meeting requirements arising from both a) the need of a flexible environment supporting current ongoing research, and b) the willingness of providing an effective platform supporting preliminary application prototypes in the field. After introducing the task of FrameNet-based Shallow Semantic Parsing, we sketch the system processing workflow and summarize a set of successful experimental results, directing the reader to previous published papers for extended experiment descriptions and wider discussion of the achieved results.

pdf bib
Syntactic/Semantic Structures for Textual Entailment Recognition
Yashar Mehdad | Alessandro Moschitti | Fabio Massimo Zanzotto
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Proceedings of TextGraphs-5 - 2010 Workshop on Graph-based Methods for Natural Language Processing
Carmen Banea | Alessandro Moschitti | Swapna Somasundaran | Fabio Massimo Zanzotto
Proceedings of TextGraphs-5 - 2010 Workshop on Graph-based Methods for Natural Language Processing

pdf bib
Syntactic and Semantic Structure for Opinion Expression Detection
Richard Johansson | Alessandro Moschitti
Proceedings of the Fourteenth Conference on Computational Natural Language Learning

pdf bib
On Reverse Feature Engineering of Syntactic Tree Kernels
Daniele Pighin | Alessandro Moschitti
Proceedings of the Fourteenth Conference on Computational Natural Language Learning

2009

pdf bib
Reverse Engineering of Tree Kernel Feature Spaces
Daniele Pighin | Alessandro Moschitti
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Re-Ranking Models Based-on Small Training Data for Spoken Language Understanding
Marco Dinarelli | Alessandro Moschitti | Giuseppe Riccardi
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Convolution Kernels on Constituent, Dependency and Sequential Structures for Relation Extraction
Truc-Vien T. Nguyen | Alessandro Moschitti | Giuseppe Riccardi
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Re-Ranking Models for Spoken Language Understanding
Marco Dinarelli | Alessandro Moschitti | Giuseppe Riccardi
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

pdf bib
Syntactic and Semantic Kernels for Short Text Pair Categorization
Alessandro Moschitti
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

pdf bib
Shallow Semantic Parsing for Spoken Language Understanding
Bonaventura Coppola | Alessandro Moschitti | Giuseppe Riccardi
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

pdf bib
Annotating Spoken Dialogs: From Speech Segments to Dialog Acts and Frame Semantics
Marco Dinarelli | Silvia Quarteroni | Sara Tonelli | Alessandro Moschitti | Giuseppe Riccardi
Proceedings of SRSL 2009, the 2nd Workshop on Semantic Representation of Spoken Language

pdf bib
Efficient Linearization of Tree Kernel Functions
Daniele Pighin | Alessandro Moschitti
Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009)

2008

pdf bib
Coreference Systems Based on Kernels Methods
Yannick Versley | Alessandro Moschitti | Massimo Poesio | Xiaofeng Yang
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf bib
Tree Kernels for Semantic Role Labeling
Alessandro Moschitti | Daniele Pighin | Roberto Basili
Computational Linguistics, Volume 34, Number 2, June 2008 - Special Issue on Semantic Role Labeling

Developing a full coreference system able to run all the way from raw text to semantic interpretation is a considerable engineering effort. Accordingly, there is very limited availability of off-the shelf tools for researchers whose interests are not primarily in coreference or others who want to concentrate on a specific aspect of the problem. We present BART, a highly modular toolkit for developing coreference applications. In the Johns Hopkins workshop on using lexical and encyclopedic knowledge for entity disambiguation, the toolkit was used to extend a reimplementation of Soon et al.s proposal with a variety of additional syntactic and knowledge-based features, and experiment with alternative resolution processes, preprocessing tools, and classifiers. BART has been released as open source software and is available from http://www.sfs.uni-tuebingen.de/~versley/BART

pdf bib
Semantic Role Labeling Systems for Arabic using Kernel Methods
Mona Diab | Alessandro Moschitti | Daniele Pighin
Proceedings of ACL-08: HLT

pdf bib
Kernels on Linguistic Structures for Answer Extraction
Alessandro Moschitti | Silvia Quarteroni
Proceedings of ACL-08: HLT, Short Papers

pdf bib
Encoding Tree Pair-Based Graphs in Learning Algorithms: The Textual Entailment Recognition Case
Alessandro Moschitti | Fabio Massimo Zanzotto
Coling 2008: Proceedings of the 3rd Textgraphs workshop on Graph-based Algorithms for Natural Language Processing

2007

pdf bib
Exploiting Syntactic and Shallow Semantic Kernels for Question Answer Classification
Alessandro Moschitti | Silvia Quarteroni | Roberto Basili | Suresh Manandhar
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf bib
CUNIT: A Semantic Role Labeling System for Modern Standard Arabic
Mona Diab | Alessandro Moschitti | Daniele Pighin
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

pdf bib
RTV: Tree Kernels for Thematic Role Classification
Daniele Pighin | Alessandro Moschitti | Roberto Basili
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

pdf bib
Shallow Semantic in Fast Textual Entailment Rule Learners
Fabio Massimo Zanzotto | Marco Pennacchiotti | Alessandro Moschitti
Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing

2006

pdf bib
Making Tree Kernels Practical for Natural Language Learning
Alessandro Moschitti
11th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib abs
A Tree Kernel approach to Question and Answer Classification in Question Answering Systems
Alessandro Moschitti | Roberto Basili
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

A critical step in Question Answering design is the definition of the models for question focus identification and answer extraction. In case of factoid questions, we can use a question classifier (trained according to a target taxonomy) and a named entity recognizer. Unfortunately, this latter cannot be applied to generate answers related to non-factoid questions. In this paper, we tackle such problem by designing classifiers of non-factoid answers. As the feature design for this learning task is very complex, we take advantage of tree kernels to generate large feature set from the syntactic parse trees of passages relevant to the target question. Such kernels encode syntactic and lexical information in Support Vector Machines which can decide if a sentence focuses on a target taxonomy subject. The experiments with SVMs on the TREC 10 dataset show that our approach is an interesting future research.

pdf bib
Syntactic Kernels for Natural Language Learning: the Semantic Role Labeling Case
Alessandro Moschitti
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

pdf bib
Automatic Learning of Textual Entailments with Cross-Pair Similarities
Fabio Massimo Zanzotto | Alessandro Moschitti
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Semantic Role Labeling via FrameNet, VerbNet and PropBank
Ana-Maria Giuglea | Alessandro Moschitti
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics