Lluís Màrquez

Also published as: Lluis Márquez, Lluis Màrquez, L. Màrquez, Lluis Marquez, Lluís Marquez

2026

Findings of the Association for Computational Linguistics: EACL 2026
Vera Demberg | Kentaro Inui | Lluís Marquez
Findings of the Association for Computational Linguistics: EACL 2026

pdf bib

Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Vera Demberg | Kentaro Inui | Lluís Marquez
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib

Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)
Vera Demberg | Kentaro Inui | Lluís Marquez
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)

2025

pdf bib abs

The safety alignment ability of Vision-Language Models (VLMs) is prone to be degraded by the integration of the vision module compared to its LLM backbone. We investigate this phenomenon, dubbed as “safety alignment degradation” in this paper, and show that the challenge arises from the representation gap that emerges when introducing vision modality to VLMs. In particular, we show that the representations of multi-modal inputs shift away from that of text-only inputs which represent the distribution that the LLM backbone is optimized for. At the same time, the safety alignment capabilities, initially developed within the textual embedding space, do not successfully transfer to this new multi-modal representation space. To reduce safety alignment degradation, we introduce Cross-Modality Representation Manipulation (CMRM), an inference time representation intervention method for recovering the safety alignment ability that is inherent in the LLM backbone of VLMs, while simultaneously preserving the functional capabilities of VLMs. The empirical results show that our framework significantly recovers the alignment ability that is inherited from the LLM backbone with minimal impact on the fluency and linguistic capabilities of pre-trained VLMs even without additional training. Specifically, the unsafe rate of LLaVA-7B on multi-modal input can be reduced from 61.53% to as low as 3.15% with only inference-time intervention.

pdf bib abs

Recent advancements in large language models (LLMs) have enabled their successful application to a broad range of tasks. However, in information-intensive tasks, the prompt length can grow fast, leading to increased computational requirements, performance degradation, and induced biases from irrelevant or redundant information. Recently, various prompt compression techniques have been introduced to optimize the trade-off between reducing input length and retaining performance. We propose a holistic evaluation framework that allows for in-depth analysis of prompt compression methods. We focus on three key aspects, besides compression ratio: (i) downstream task performance, (ii) grounding in the input context, and (iii) information preservation. Using our framework, we analyze state-of-the-art soft and hard compression methods and show that some fail to preserve key details from the original prompt, limiting performance on complex tasks. By identifying these limitations, we are able to improve one soft prompting method by controlling compression granularity, achieving up to +23% in downstream performance, +8 BERTScore points in grounding, and 2.7× more entities preserved in compression. Ultimately, we find that the best effectiveness/compression rate trade-off is achieved with soft prompting combined with sequence-level training.

2024

pdf bib abs

Factual Confidence of LLMs: on Reliability and Robustness of Current Estimators
Matéo Mahaut | Laura Aina | Paula Czarnowska | Momchil Hardalov | Thomas Müller | Lluis Marquez
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large Language Models (LLMs) tend to be unreliable on fact-based answers.To address this problem, NLP researchers have proposed a range of techniques to estimate LLM’s confidence over facts. However, due to the lack of a systematic comparison, it is not clear how the different methods compare to one other.To fill this gap, we present a rigorous survey and empirical comparison of estimators of factual confidence.We define an experimental framework allowing for fair comparison, covering both fact-verification and QA. Our experiments across a series of LLMs indicate that trained hidden-state probes provide the most reliable confidence estimates; albeit at the expense of requiring access to weights and supervision data. We also conduct a deeper assessment of the methods, in which we measure the consistency of model behavior under meaning-preserving variations in the input. We find that the factual confidence of LLMs is often unstable across semantically equivalent inputs, suggesting there is much room for improvement for the stability of models’ parametric knowledge.

2023

pdf bib abs

Sequence-to-sequence state-of-the-art systems for dialogue state tracking (DST) use the full dialogue history as input, represent the current state as a list with all the slots, and generate the entire state from scratch at each dialogue turn. This approach is inefficient, especially when the number of slots is large and the conversation is long. We propose Diable, a new task formalisation that simplifies the design and implementation of efficient DST systems and allows one to easily plug and play large language models. We represent the dialogue state as a table and formalise DST as a table manipulation task. At each turn, the system updates the previous state by generating table operations based on the dialogue context. Extensive experimentation on the MultiWoz datasets demonstrates that Diable (i) outperforms strong efficient DST baselines, (ii) is 2.4x more time efficient than current state-of-the-art methods while retaining competitive Joint Goal Accuracy, and (iii) is robust to noisy data annotations due to the table operations approach.

2019

pdf bib abs

It Takes Nine to Smell a Rat: Neural Multi-Task Learning for Check-Worthiness Prediction
Slavena Vasileva | Pepa Atanasova | Lluís Màrquez | Alberto Barrón-Cedeño | Preslav Nakov
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

We propose a multi-task deep-learning approach for estimating the check-worthiness of claims in political debates. Given a political debate, such as the 2016 US Presidential and Vice-Presidential ones, the task is to predict which statements in the debate should be prioritized for fact-checking. While different fact-checking organizations would naturally make different choices when analyzing the same debate, we show that it pays to learn from multiple sources simultaneously (PolitiFact, FactCheck, ABC, CNN, NPR, NYT, Chicago Tribune, The Guardian, and Washington Post) in a multi-task learning setup, even when a particular source is chosen as a target to imitate. Our evaluation shows state-of-the-art results on a standard dataset for the task of check-worthiness prediction.

pdf bib abs

Book QA: Stories of Challenges and Opportunities
Stefanos Angelidis | Lea Frermann | Diego Marcheggiani | Roi Blanco | Lluís Màrquez
Proceedings of the 2nd Workshop on Machine Reading for Question Answering

We present a system for answering questions based on the full text of books (BookQA), which first selects book passages given a question at hand, and then uses a memory network to reason and predict an answer. To improve generalization, we pretrain our memory network using artificial questions generated from book sentences. We experiment with the recently published NarrativeQA corpus, on the subset of Who questions, which expect book characters as answers. We experimentally show that BERT-based retrieval and pretraining improve over baseline results significantly. At the same time, we confirm that NarrativeQA is a highly challenging data set, and that there is need for novel research in order to achieve high-precision BookQA results. We analyze some of the bottlenecks of the current approach, and we argue that more research is needed on text representation, retrieval of relevant passages, and reasoning, including commonsense knowledge.

pdf bib

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Anna Korhonen | David Traum | Lluís Màrquez
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

2018

pdf bib abs

ClaimRank: Detecting Check-Worthy Claims in Arabic and English
Israa Jaradat | Pepa Gencheva | Alberto Barrón-Cedeño | Lluís Màrquez | Preslav Nakov
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

We present ClaimRank, an online system for detecting check-worthy claims. While originally trained on political debates, the system can work for any kind of text, e.g., interviews or just regular news articles. Its aim is to facilitate manual fact-checking efforts by prioritizing the claims that fact-checkers should consider first. ClaimRank supports both Arabic and English, it is trained on actual annotations from nine reputable fact-checking organizations (PolitiFact, FactCheck, ABC, CNN, NPR, NYT, Chicago Tribune, The Guardian, and Washington Post), and thus it can mimic the claim selection strategies for each and any of them, as well as for the union of them all.

pdf bib abs

Automatic Stance Detection Using End-to-End Memory Networks
Mitra Mohtarami | Ramy Baly | James Glass | Preslav Nakov | Lluís Màrquez | Alessandro Moschitti
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

We present an effective end-to-end memory network model that jointly (i) predicts whether a given document can be considered as relevant evidence for a given claim, and (ii) extracts snippets of evidence that can be used to reason about the factuality of the target claim. Our model combines the advantages of convolutional and recurrent neural networks as part of a memory network. We further introduce a similarity matrix at the inference level of the memory network in order to extract snippets of evidence for input claims more accurately. Our experiments on a public benchmark dataset, FakeNewsChallenge, demonstrate the effectiveness of our approach.

pdf bib abs

Integrating Stance Detection and Fact Checking in a Unified Corpus
Ramy Baly | Mitra Mohtarami | James Glass | Lluís Màrquez | Alessandro Moschitti | Preslav Nakov
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

A reasonable approach for fact checking a claim involves retrieving potentially relevant documents from different sources (e.g., news websites, social media, etc.), determining the stance of each document with respect to the claim, and finally making a prediction about the claim’s factuality by aggregating the strength of the stances, while taking the reliability of the source into account. Moreover, a fact checking system should be able to explain its decision by providing relevant extracts (rationales) from the documents. Yet, this setup is not directly supported by existing datasets, which treat fact checking, document retrieval, source credibility, stance detection and rationale extraction as independent tasks. In this paper, we support the interdependencies between these tasks as annotations in the same corpus. We implement this setup on an Arabic fact checking corpus, the first of its kind.

pdf bib abs

Joint Multitask Learning for Community Question Answering Using Task-Specific Embeddings
Shafiq Joty | Lluís Màrquez | Preslav Nakov
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We address jointly two important tasks for Question Answering in community forums: given a new question, (i) find related existing questions, and (ii) find relevant answers to this new question. We further use an auxiliary task to complement the previous two, i.e., (iii) find good answers with respect to the thread question in a question-comment thread. We use deep neural networks (DNNs) to learn meaningful task-specific embeddings, which we then incorporate into a conditional random field (CRF) model for the multitask setting, performing joint learning over a complex graph structure. While DNNs alone achieve competitive results when trained to produce the embeddings, the CRF, which makes use of the embeddings and the dependencies between the tasks, improves the results significantly and consistently across a variety of evaluation metrics, thus showing the complementarity of DNNs and structured learning.

2017

pdf bib abs

Discourse Structure in Machine Translation Evaluation
Shafiq Joty | Francisco Guzmán | Lluís Màrquez | Preslav Nakov
Computational Linguistics, Volume 43, Issue 4 - December 2017

In this article, we explore the potential of using sentence-level discourse structure for machine translation evaluation. We first design discourse-aware similarity measures, which use all-subtree kernels to compare discourse parse trees in accordance with the Rhetorical Structure Theory (RST). Then, we show that a simple linear combination with these measures can help improve various existing machine translation evaluation metrics regarding correlation with human judgments both at the segment level and at the system level. This suggests that discourse information is complementary to the information used by many of the existing evaluation metrics, and thus it could be taken into account when developing richer evaluation metrics, such as the WMT-14 winning combined metric DiscoTKparty. We also provide a detailed analysis of the relevance of various discourse elements and relations from the RST parse trees for machine translation evaluation. In particular, we show that (i) all aspects of the RST tree are relevant, (ii) nuclearity is more useful than relation type, and (iii) the similarity of the translation RST tree to the reference RST tree is positively correlated with translation quality.

pdf bib abs

We describe SemEval–2017 Task 3 on Community Question Answering. This year, we reran the four subtasks from SemEval-2016: (A) Question–Comment Similarity, (B) Question–Question Similarity, (C) Question–External Comment Similarity, and (D) Rerank the correct answers for a new question in Arabic, providing all the data from 2015 and 2016 for training, and fresh data for testing. Additionally, we added a new subtask E in order to enable experimentation with Multi-domain Question Duplicate Detection in a larger-scale scenario, using StackExchange subforums. A total of 23 teams participated in the task, and submitted a total of 85 runs (36 primary and 49 contrastive) for subtasks A–D. Unfortunately, no teams participated in subtask E. A variety of approaches and features were used by the participating systems to address the different subtasks. The best systems achieved an official score (MAP) of 88.43, 47.22, 15.46, and 61.16 in subtasks A, B, C, and D, respectively. These scores are better than the baselines, especially for subtasks A–C.

pdf bib abs

Do Not Trust the Trolls: Predicting Credibility in Community Question Answering Forums
Preslav Nakov | Tsvetomila Mihaylova | Lluís Màrquez | Yashkumar Shiroya | Ivan Koychev
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

We address information credibility in community forums, in a setting in which the credibility of an answer posted in a question thread by a particular user has to be predicted. First, we motivate the problem and we create a publicly available annotated English corpus by crowdsourcing. Second, we propose a large set of features to predict the credibility of the answers. The features model the user, the answer, the question, the thread as a whole, and the interaction between them. Our experiments with ranking SVMs show that the credibility labels can be predicted with high performance according to several standard IR ranking metrics, thus supporting the potential usage of this layer of credibility information in practical applications. The features modeling the profile of the user (in particular trollness) turn out to be most important, but embedding features modeling the answer and the similarity between the question and the answer are also very relevant. Overall, half of the gap between the baseline performance and the perfect classifier can be covered using the proposed features.

pdf bib abs

Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks
Yonatan Belinkov | Lluís Màrquez | Hassan Sajjad | Nadir Durrani | Fahim Dalvi | James Glass
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

While neural machine translation (NMT) models provide improved translation quality in an elegant framework, it is less clear what they learn about language. Recent work has started evaluating the quality of vector representations learned by NMT models on morphological and syntactic tasks. In this paper, we investigate the representations learned at different layers of NMT encoders. We train NMT systems on parallel data and use the models to extract features for training a classifier on two tasks: part-of-speech and semantic tagging. We then measure the performance of the classifier as a proxy to the quality of the original NMT model for the given task. Our quantitative analysis yields interesting insights regarding representation learning in NMT models. For instance, we find that higher layers are better at learning semantics while lower layers tend to be better for part-of-speech tagging. We also observe little effect of the target language on source-side representations, especially in higher quality models.

pdf bib abs

Cross-language Learning with Adversarial Neural Networks
Shafiq Joty | Preslav Nakov | Lluís Màrquez | Israa Jaradat
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

We address the problem of cross-language adaptation for question-question similarity reranking in community question answering, with the objective to port a system trained on one input language to another input language given labeled training data for the first language and only unlabeled data for the second language. In particular, we propose to use adversarial training of neural networks to learn high-level features that are discriminative for the main learning task, and at the same time are invariant across the input languages. The evaluation results show sizable improvements for our cross-language adversarial neural network (CLANN) model over a strong non-adversarial system.

pdf bib abs

A Context-Aware Approach for Detecting Worth-Checking Claims in Political Debates
Pepa Gencheva | Preslav Nakov | Lluís Màrquez | Alberto Barrón-Cedeño | Ivan Koychev
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

In the context of investigative journalism, we address the problem of automatically identifying which claims in a given document are most worthy and should be prioritized for fact-checking. Despite its importance, this is a relatively understudied problem. Thus, we create a new corpus of political debates, containing statements that have been fact-checked by nine reputable sources, and we train machine learning models to predict which claims should be prioritized for fact-checking, i.e., we model the problem as a ranking task. Unlike previous work, which has looked primarily at sentences in isolation, in this paper we focus on a rich input representation modeling the context: relationship between the target statement and the larger context of the debate, interaction between the opponents, and reaction by the moderator and by the public. Our experiments show state-of-the-art results, outperforming a strong rivaling system by a margin, while also confirming the importance of the contextual information.

pdf bib

Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)
Nancy Ide | Aurélie Herbelot | Lluís Màrquez
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

pdf bib abs

Fully Automated Fact Checking Using External Sources
Georgi Karadzhov | Preslav Nakov | Lluís Màrquez | Alberto Barrón-Cedeño | Ivan Koychev
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

Given the constantly growing proliferation of false claims online in recent years, there has been also a growing research interest in automatically distinguishing false rumors from factually true claims. Here, we propose a general-purpose framework for fully-automatic fact checking using external sources, tapping the potential of the entire Web as a knowledge source to confirm or reject a claim. Our framework uses a deep neural network with LSTM text encoding to combine semantic kernels with task-specific embeddings that encode a claim together with pieces of potentially relevant text fragments from the Web, taking the source reliability into account. The evaluation results show good performance on two different tasks and datasets: (i) rumor detection and (ii) fact checking of the answers to a question in community question answering forums.

2016

pdf bib

pdf bib

MTE-NN at SemEval-2016 Task 3: Can Machine Translation Evaluation Help Community Question Answering?
Francisco Guzmán | Preslav Nakov | Lluís Màrquez
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib

Semi-supervised Question Retrieval with Gated Convolutions
Tao Lei | Hrishikesh Joshi | Regina Barzilay | Tommi Jaakkola | Kateryna Tymoshenko | Alessandro Moschitti | Lluís Màrquez
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib

It Takes Three to Tango: Triangulation Approach to Answer Ranking in Community Question Answering
Preslav Nakov | Lluís Màrquez | Francisco Guzmán
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib abs

We present an interactive system to provide effective and efficient search capabilities in Community Question Answering (cQA) forums. The system integrates state-of-the-art technology for answer search with a Web-based user interface specifically tailored to support the cQA forum readers. The answer search module automatically finds relevant answers for a new question by exploring related questions and the comments within their threads. The graphical user interface presents the search results and supports the exploration of related information. The system is running live at http://www.qatarliving.com/betasearch/.

pdf bib

Joint Learning with Global Inference for Comment Classification in Community Question Answering
Shafiq Joty | Lluís Màrquez | Preslav Nakov
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib

Machine Translation Evaluation Meets Community Question Answering
Francisco Guzmán | Lluís Màrquez | Preslav Nakov
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

In this paper we introduce a joint arc-factored model for syntactic and semantic dependency parsing. The semantic role labeler predicts the full syntactic paths that connect predicates with their arguments. This process is framed as a linear assignment task, which allows to control some well-formedness constraints. For the syntactic part, we define a standard arc-factored dependency model that predicts the full syntactic tree. Finally, we employ dual decomposition techniques to produce consistent syntactic and predicate-argument structures while searching over a large space of syntactic configurations. In experiments on the CoNLL-2009 English benchmark we observe very competitive results.

pdf bib

pdf bib

FAUST: Feedback Analysis for User Adaptive Statistical Translation
William Byrne | Lluis Marquez
Proceedings of Machine Translation Summit XIV: European projects

pdf bib

Real-life Translation Quality Estimation for MT System Selection
Lluis Formiga | Lluis Marquez | Jaume Pujantell
Proceedings of Machine Translation Summit XIV: Papers

2012

pdf bib

The UPC Submission to the WMT 2012 Shared Task on Quality Estimation
Daniele Pighin | Meritxell González | Lluís Màrquez
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib abs

The FAUST Corpus of Adequacy Assessments for Real-World Machine Translation Output
Daniele Pighin | Lluís Màrquez | Lluís Formiga
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We present a corpus consisting of 11,292 real-world English to Spanish automatic translations annotated with relative (ranking) and absolute (adequate/non-adequate) quality assessments. The translation requests, collected through the popular translation portal http://reverso.net, provide a most variated sample of real-world machine translation (MT) usage, from complete sentences to units of one or two words, from well-formed to hardly intelligible texts, from technical documents to colloquial and slang snippets. In this paper, we present 1) a preliminary annotation experiment that we carried out to select the most appropriate quality criterion to be used for these data, 2) a graph-based methodology inspired by Interactive Genetic Algorithms to reduce the annotation effort, and 3) the outcomes of the full-scale annotation experiment, which result in a valuable and original resource for the analysis and characterization of MT-output quality.

pdf bib

A Graphical Interface for MT Evaluation and Error Analysis
Meritxell Gonzàlez | Jesús Giménez | Lluís Màrquez
Proceedings of the ACL 2012 System Demonstrations

pdf bib

Context-Aware Machine Translation for Software Localization
Victor Muntés-Mulero | Patricia Paladini Adell | Cristina España-Bonet | Lluís Màrquez
Proceedings of the 16th Annual Conference of the European Association for Machine Translation

pdf bib abs

A Graph-based Strategy to Streamline Translation Quality Assessments
Daniele Pighin | Lluís Formiga | Lluís Màrquez
Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers

We present a detailed analysis of a graph-based annotation strategy that we employed to annotate a corpus of 11,292 real-world English to Spanish automatic translations with relative (ranking) and absolute (adequate/non-adequate) quality assessments. The proposed approach, inspired by previous work in Interactive Evolutionary Computation and Interactive Genetic Algorithms, results in a simpler and faster annotation process. We empirically compare the method against a traditional, explicit ranking approach, and show that the graph-based strategy: 1) is considerably faster, and 2) produces consistently more reliable annotations.

pdf bib

A Hybrid System for Patent Translation
Ramona Enache | Cristina España-Bonet | Aarne Ranta | Lluís Màrquez
Proceedings of the 16th Annual Conference of the European Association for Machine Translation

pdf bib

Deep evaluation of hybrid architectures: use of different metrics in MERT weight optimization
Cristina España-Bonet | Gorka Labaka | Arantza Díaz de Ilarranza | Lluís Màrquez | Kepa Sarasola
Proceedings of the Third International Workshop on Free/Open-Source Rule-Based Machine Translation

pdf bib abs

An Analysis (and an Annotated Corpus) of User Responses to Machine Translation Output
Daniele Pighin | Lluís Màrquez | Jonathan May
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We present an annotated resource consisting of open-domain translation requests, automatic translations and user-provided corrections collected from casual users of the translation portal http://reverso.net. The layers of annotation provide: 1) quality assessments for 830 correction suggestions for translations into English, at the segment level, and 2) 814 usefulness assessments for English-Spanish and English-French translation suggestions, a suggestion being useful if it contains at least local clues that can be used to improve translation quality. We also discuss the results of our preliminary experiments concerning 1) the development of an automatic filter to separate useful from non-useful feedback, and 2) the incorporation in the machine translation pipeline of bilingual phrases extracted from the suggestions. The annotated data, available for download from ftp://mi.eng.cam.ac.uk/data/faust/LW-UPC-Oct11-FAUST-feedback-annotation.tgz, is released under a Creative Commons license. To our best knowledge, this is the first resource of this kind that has ever been made publicly available.

2011

pdf bib

pdf bib

Proceedings of TextGraphs-6: Graph-based Methods for Natural Language Processing
Irina Matveeva | Alessandro Moschitti | Lluís Màrquez | Fabio Massimo Zanzotto
Proceedings of TextGraphs-6: Graph-based Methods for Natural Language Processing

pdf bib

Hybrid Machine Translation Guided by a Rule–Based System
Cristina España-Bonet | Gorka Labaka | Arantza Díaz de Ilarraza | Lluís Màrquez
Proceedings of Machine Translation Summit XIII: Papers

pdf bib

Automatic Projection of Semantic Structures: an Application to Pairwise Translation Ranking
Daniele Pighin | Lluís Màrquez
Proceedings of Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation

2010

pdf bib

Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts
Lluís Màrquez | Haifeng Wang
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

pdf bib

Robust Estimation of Feature Weights in Statistical Machine Translation
Cristina España-Bonet | Lluís Màrquez
Proceedings of the 14th Annual Conference of the European Association for Machine Translation

pdf bib

Improving Semantic Role Classification with Selectional Preferences
Beñat Zapirain | Eneko Agirre | Lluís Màrquez | Mihai Surdeanu
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib

Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Hang Li | Lluís Màrquez
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib

pdf bib

Document-Level Automatic MT Evaluation based on Discourse Representations
Elisabet Comelles | Jesús Giménez | Lluís Màrquez | Irene Castellón | Victoria Arranz
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

2009

pdf bib

On the Robustness of Syntactic and Semantic Features for Automatic MT Evaluation
Jesús Giménez | Lluís Màrquez
Proceedings of the Fourth Workshop on Statistical Machine Translation

pdf bib

Generalizing over Lexical Features: Selectional Preferences for Semantic Role Classification
Beñat Zapirain | Eneko Agirre | Lluís Màrquez
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

pdf bib

Semantic Role Labeling: Past, Present and Future
Lluís Màrquez
Tutorial Abstracts of ACL-IJCNLP 2009

pdf bib

pdf bib

A Second-Order Joint Eisner Model for Syntactic and Semantic Dependency Parsing
Xavier Lluís | Stefan Bott | Lluís Màrquez
Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task

pdf bib

Proceedings of the 13th Annual Conference of the European Association for Machine Translation
Lluís Màrquez | Harold Somers
Proceedings of the 13th Annual Conference of the European Association for Machine Translation

pdf bib

Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions (SEW-2009)
Eneko Agirre | Lluís Màrquez | Richard Wicentowski
Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions (SEW-2009)

pdf bib

SemEval-2010 Task 1: Coreference Resolution in Multiple Languages
Marta Recasens | Toni Martí | Mariona Taulé | Lluís Màrquez | Emili Sapena
Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions (SEW-2009)

2008

pdf bib abs

Towards Heterogeneous Automatic MT Error Analysis
Jesús Giménez | Lluís Màrquez
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This work studies the viability of performing heterogeneous automatic MT error analyses. Error analysis is, undoubtly, one of the most crucial stages in the development cycle of an MT system. However, often not enough attention is paid to this process. The reason is that performing an accurate error analysis requires intensive human labor. In order to speed up the error analysis process, we suggest partially automatizing it by having automatic evaluation metrics play a more active role. For that purpose, we have compiled a large and heterogeneous set of features at different linguistic levels and at different levels of granularity. Through a practical case study, we show how these features provide an effective means of ellaborating interpretable and detailed automatic reports of translation quality.

pdf bib

The CoNLL 2008 Shared Task on Joint Parsing of Syntactic and Semantic Dependencies
Mihai Surdeanu | Richard Johansson | Adam Meyers | Lluís Màrquez | Joakim Nivre
CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning

pdf bib

Robustness and Generalization of Role Sets: PropBank vs. VerbNet
Beñat Zapirain | Eneko Agirre | Lluís Màrquez
Proceedings of ACL-08: HLT

pdf bib

Heterogeneous Automatic MT Evaluation Through Non-Parametric Metric Combinations
Jesús Giménez | Lluís Màrquez
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

pdf bib

Special Issue Introduction: Semantic Role Labeling: An Introduction to the Special Issue
Lluís Màrquez | Xavier Carreras | Kenneth C. Litkowski | Suzanne Stevenson
Computational Linguistics, Volume 34, Number 2, June 2008 - Special Issue on Semantic Role Labeling

pdf bib

A Joint Model for Parsing Syntactic and Semantic Dependencies
Xavier Lluís | Lluís Màrquez
CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning

pdf bib

A Smorgasbord of Features for Automatic MT Evaluation
Jesús Giménez | Lluís Màrquez
Proceedings of the Third Workshop on Statistical Machine Translation

2007

pdf bib

Linguistic Features for Automatic Evaluation of Heterogenous MT Systems
Jesús Giménez | Lluís Màrquez
Proceedings of the Second Workshop on Statistical Machine Translation

pdf bib

Context-aware Discriminative Phrase Selection for Statistical Machine Translation
Jesús Giménez | Lluís Màrquez
Proceedings of the Second Workshop on Statistical Machine Translation

pdf bib

UPC: Experiments with Joint Learning within SemEval Task 9
Lluís Màrquez | Lluís Padró | Mihai Surdeanu | Luis Villarejo
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

pdf bib

Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)
Eneko Agirre | Lluís Màrquez | Richard Wicentowski
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

pdf bib

SemEval-2007 Task 09: Multilevel Semantic Annotation of Catalan and Spanish
Lluís Màrquez | Luis Villarejo | M. A. Martí | Mariona Taulé
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

pdf bib

UBC-UPC: Sequential SRL Using Selectional Preferences. An approach with Maximum Entropy Markov Models
Beñat Zapirain | Eneko Agirre | Lluís Màrquez
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

2006

pdf bib

Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X)
Lluís Màrquez | Dan Klein
Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X)

pdf bib

The LDV-COMBO system for SMT
Jesús Giménez | Lluís Màrquez
Proceedings on the Workshop on Statistical Machine Translation

pdf bib

Projective Dependency Parsing with Perceptron
Xavier Carreras | Mihai Surdeanu | Lluís Màrquez
Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X)

pdf bib abs

Generation of Language Resources for the Development of Speech Technologies in Catalan
A. Moreno | Albert Febrer | Lluis Márquez
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper describes a joint initiative of the Catalan and Spanish Government to produce Language Resources for the Catalan language. A similar methodology to the Basic Language Resource Kit (BLARK) concept was applied to determine the priorities on the production of the Language Resources. The paper shows the LR and tools currently available for the Catalan Language both for Language and Speech technologies. The production of large databases for Automatic Speech Recognition purposes already started. All the resources generated in the project follow EU standards, will be validated by an external centre and will be free and public available through ELRA.

pdf bib

Low-Cost Enrichment of Spanish WordNet with Automatically Translated Glosses: Combining General and Specialized Models
Jesús Giménez | Lluís Màrquez
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf bib

MT Evaluation: Human-Like vs. Human Acceptable
Enrique Amigó | Jesús Giménez | Julio Gonzalo | Lluís Màrquez
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions