Héctor Martínez Alonso - ACL Anthology

Héctor Martínez Alonso

Also published as: Hector Martinez, Hector Martínez Alonso, Héctor Martinez Alonso, Héctor Martínez, Hector Martinez Alonso, Héctor Martínez Alonso, Hector Martinez Alonso

2025

ASPERA: A Simulated Environment to Evaluate Planning for Complex Action Execution
Alexandru Coca | Mark Gaynor | Zhenxing Zhang | Jianpeng Cheng | Bo-Hsiang Tseng | Peter Boothroyd | Hector Martinez Alonso | Diarmuid O Seaghdha | Anders Johannsen
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

This work evaluates the potential of large language models (LLMs) to power digital assistants capable of complex action execution. Such assistants rely on pre-trained programming knowledge to execute multi-step goals by composing objects and functions defined in assistant libraries into action execution programs. To achieve this, we develop ASPERA, a framework comprising an assistant library simulation and a human-assisted LLM data generation engine. Our engine allows developers to guide LLM generation of high-quality tasks consisting of complex user queries, simulation state and corresponding validation programs, tackling data availability and evaluation robustness challenges. Alongside the framework we release Asper-Bench, an evaluation dataset of 250 challenging tasks generated using ASPERA, which we use to show that program generation grounded in custom assistant libraries is a significant challenge to LLMs compared to dependency-free code generation.

PyTOD: Programmable Task-Oriented Dialogue with Execution Feedback
Alexandru Coca | Bo-Hsiang Tseng | Peter Boothroyd | Jianpeng Cheng | Zhenxing Zhang | Mark Gaynor | Joe Stacey | Tristan Guigue | Héctor Martínez Alonso | Diarmuid Ó Séaghdha | Anders Johannsen
Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Programmable task-oriented dialogue (TOD) agents enable language models to follow structured dialogue policies, but their effectiveness hinges on accurate dialogue state tracking (DST). We present PyTOD, an agent that generates executable code to track dialogue state and uses policy and execution feedback for efficient error correction. To achieve this, PyTOD employs a simple constrained decoding approach, using a language model instead of grammar rules to follow API schemata. This leads to state-of-the-art DST performance on the challenging SGD benchmark. Our experiments show that PyTOD surpasses strong baselines in both accuracy and cross-turn consistency, demonstrating the effectiveness of execution-aware state tracking.

2020

We consider a new perspective on dialog state tracking (DST), the task of estimating a user’s goal through the course of a dialog. By formulating DST as a semantic parsing task over hierarchical representations, we can incorporate semantic compositionality, cross-domain knowledge sharing and co-reference. We present TreeDST, a dataset of 27k conversations annotated with tree-structured dialog states and system acts. We describe an encoder-decoder framework for DST with hierarchical representations, which leads to ~20% improvement over state-of-the-art DST approaches that operate on a flat meaning space of slot-value pairs.

2018

Automatic Annotation of Semantic Term Types in the Complete ACL Anthology Reference Corpus
Anne-Kathrin Schumann | Héctor Martínez Alonso
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Cheating a Parser to Death: Data-driven Cross-Treebank Annotation Transfer
Djamé Seddah | Eric de la Clergerie | Benoît Sagot | Héctor Martínez Alonso | Marie Candito
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Grotoco@SLAM: Second Language Acquisition Modeling with Simple Features, Learners and Task-wise Models
Sigrid Klerke | Héctor Martínez Alonso | Barbara Plank
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications

We present our submission to the 2018 Duolingo Shared Task on Second Language Acquisition Modeling (SLAM). We focus on evaluating a range of features for the task, including user-derived measures, while examining how far we can get with a simple linear classifier. Our analysis reveals that errors differ per exercise format, which motivates our final and best-performing system: a task-wise (per exercise-format) model.

2017

When is multitask learning effective? Semantic sequence prediction under varying data conditions
Héctor Martínez Alonso | Barbara Plank
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Multitask learning has been applied successfully to a range of tasks, mostly morphosyntactic. However, little is known on when MTL works and whether there are data characteristics that help to determine the success of MTL. In this paper we evaluate a range of semantic sequence labeling tasks in a MTL setup. We examine different auxiliary task configurations, amongst which a novel setup, and correlate their impact to data-dependent conditions. Our results show that MTL is not always effective, because significant improvements are obtained only for 1 out of 5 tasks. When successful, auxiliary tasks with compact and more uniform label distributions are preferable.

Annotating omission in statement pairs
Héctor Martínez Alonso | Amaury Delamaire | Benoît Sagot
Proceedings of the 11th Linguistic Annotation Workshop

We focus on the identification of omission in statement pairs. We compare three annotation schemes, namely two different crowdsourcing schemes and manual expert annotation. We show that the simplest of the two crowdsourcing approaches yields a better annotation quality than the more complex one. We use a dedicated classifier to assess whether the annotators’ behavior can be explained by straightforward linguistic features. The classifier benefits from a modeling that uses lexical information beyond length and overlap measures. However, for our task, we argue that expert and not crowdsourcing-based annotation is the best compromise between annotation cost and quality.

Benchmarking Joint Lexical and Syntactic Analysis on Multiword-Rich Data
Matthieu Constant | Héctor Martinez Alonso
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)

This article evaluates the extension of a dependency parser that performs joint syntactic analysis and multiword expression identification. We show that, given sufficient training data, the parser benefits from explicit multiword information and improves overall labeled accuracy score in eight of the ten evaluation cases.

Parsing Universal Dependencies without training
Héctor Martínez Alonso | Željko Agić | Barbara Plank | Anders Søgaard
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

We present UDP, the first training-free parser for Universal Dependencies (UD). Our algorithm is based on PageRank and a small set of specific dependency head rules. UDP features two-step decoding to guarantee that function words are attached as leaf nodes. The parser requires no training, and it is competitive with a delexicalized transfer system. UDP offers a linguistically sound unsupervised alternative to cross-lingual parsing for UD. The parser has very few parameters and distinctly robust to domain change across languages.

Improving neural tagging with lexical information
Benoît Sagot | Héctor Martínez Alonso
Proceedings of the 15th International Conference on Parsing Technologies

Neural part-of-speech tagging has achieved competitive results with the incorporation of character-based and pre-trained word embeddings. In this paper, we show that a state-of-the-art bi-LSTM tagger can benefit from using information from morphosyntactic lexicons as additional input. The tagger, trained on several dozen languages, shows a consistent, average improvement when using lexical information, even when also using character-based embeddings, thus showing the complementarity of the different sources of lexical information. The improvements are particularly important for the smaller datasets.

CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
Daniel Zeman | Martin Popel | Milan Straka | Jan Hajič | Joakim Nivre | Filip Ginter | Juhani Luotolahti | Sampo Pyysalo | Slav Petrov | Martin Potthast | Francis Tyers | Elena Badmaeva | Memduh Gokirmak | Anna Nedoluzhko | Silvie Cinková | Jan Hajič jr. | Jaroslava Hlaváčová | Václava Kettnerová | Zdeňka Urešová | Jenna Kanerva | Stina Ojala | Anna Missilä | Christopher D. Manning | Sebastian Schuster | Siva Reddy | Dima Taji | Nizar Habash | Herman Leung | Marie-Catherine de Marneffe | Manuela Sanguinetti | Maria Simi | Hiroshi Kanayama | Valeria de Paiva | Kira Droganova | Héctor Martínez Alonso | Çağrı Çöltekin | Umut Sulubacak | Hans Uszkoreit | Vivien Macketanz | Aljoscha Burchardt | Kim Harris | Katrin Marheinecke | Georg Rehm | Tolga Kayadelen | Mohammed Attia | Ali Elkahky | Zhuoran Yu | Emily Pitler | Saran Lertpradit | Michael Mandl | Jesse Kirchner | Hector Fernandez Alcalde | Jana Strnadová | Esha Banerjee | Ruli Manurung | Antonio Stella | Atsuko Shimada | Sookyoung Kwak | Gustavo Mendonça | Tatiana Lando | Rattima Nitisaroj | Josie Li
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets. In 2017, the task was devoted to learning dependency parsers for a large number of languages, in a real-world setting without any gold-standard annotation on input. All test sets followed a unified annotation scheme, namely that of Universal Dependencies. In this paper, we define the task and evaluation methodology, describe how the data sets were prepared, report and analyze the main results, and provide a brief categorization of the different approaches of the participating systems.

2016

An empirically grounded expansion of the supersense inventory
Hector Martinez Alonso | Anders Johannsen | Sanni Nimb | Sussi Olsen | Bolette Pedersen
Proceedings of the 8th Global WordNet Conference (GWC)

In this article we present an expansion of the supersense inventory. All new super-senses are extensions of members of the current inventory, which we postulate by identifying semantically coherent groups of synsets. We cover the expansion of the already-established supernsense inventory for nouns and verbs, the addition of coarse supersenses for adjectives in absence of a canonical supersense inventory, and super-senses for verbal satellites. We evaluate the viability of the new senses examining the annotation agreement, frequency and co-ocurrence patterns.

Supersense tagging with inter-annotator disagreement
Héctor Martínez Alonso | Anders Johannsen | Barbara Plank
Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with ACL 2016 (LAW-X 2016)

CoastalCPH at SemEval-2016 Task 11: The importance of designing your Neural Networks right
Joachim Bingel | Natalie Schluter | Héctor Martínez Alonso
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

The SemDaX Corpus ― Sense Annotations with Scalable Sense Inventories
Bolette Pedersen | Anna Braasch | Anders Johannsen | Héctor Martínez Alonso | Sanni Nimb | Sussi Olsen | Anders Søgaard | Nicolai Hartvig Sørensen
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We launch the SemDaX corpus which is a recently completed Danish human-annotated corpus available through a CLARIN academic license. The corpus includes approx. 90,000 words, comprises six textual domains, and is annotated with sense inventories of different granularity. The aim of the developed corpus is twofold: i) to assess the reliability of the different sense annotation schemes for Danish measured by qualitative analyses and annotation agreement scores, and ii) to serve as training and test data for machine learning algorithms with the practical purpose of developing sense taggers for Danish. To these aims, we take a new approach to human-annotated corpus resources by double annotating a much larger part of the corpus than what is normally seen: for the all-words task we double annotated 60% of the material and for the lexical sample task 100%. We include in the corpus not only the adjucated files, but also the diverging annotations. In other words, we consider not all disagreement to be noise, but rather to contain valuable linguistic information that can help us improve our annotation schemes and our learning algorithms.

Multilingual Projection for Parsing Truly Low-Resource Languages
Željko Agić | Anders Johannsen | Barbara Plank | Héctor Martínez Alonso | Natalie Schluter | Anders Søgaard
Transactions of the Association for Computational Linguistics, Volume 4

We propose a novel approach to cross-lingual part-of-speech tagging and dependency parsing for truly low-resource languages. Our annotation projection-based approach yields tagging and parsing models for over 100 languages. All that is needed are freely available parallel texts, and taggers and parsers for resource-rich languages. The empirical evaluation across 30 test languages shows that our method consistently provides top-level accuracies, close to established upper bounds, and outperforms several competitive baselines.

From Noisy Questions to Minecraft Texts: Annotation Challenges in Extreme Syntax Scenario
Héctor Martínez Alonso | Djamé Seddah | Benoît Sagot
Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)

User-generated content presents many challenges for its automatic processing. While many of them do come from out-of-vocabulary effects, others spawn from different linguistic phenomena such as unusual syntax. In this work we present a French three-domain data set made up of question headlines from a cooking forum, game chat logs and associated forums from two popular online games (MINECRAFT & LEAGUE OF LEGENDS). We chose these domains because they encompass different degrees of lexical and syntactic compliance with canonical language. We conduct an automatic and manual evaluation of the difficulties of processing these domains for part-of-speech prediction, and introduce a pilot study to determine whether dependency analysis lends itself well to annotate these data. We also discuss the development cost of our data set.

Learning Paraphrasing for Multiword Expressions
Seid Muhie Yimam | Héctor Martínez Alonso | Martin Riedl | Chris Biemann
Proceedings of the 12th Workshop on Multiword Expressions

Approximate unsupervised summary optimisation for selections of ROUGE
Natalie Schluter | Héctor Martínez Alonso
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 2 : TALN (Posters)

Approximate summary optimisation for selections of ROUGE It is standard to measure automatic summariser performance using the ROUGE metric. Unfortunately, ROUGE is not appropriate for unsupervised summarisation approaches. On the other hand, we show that it is possible to optimise approximately for ROUGE-n by using a document-weighted ROUGE objective. Doing so results in state-of-the-art summariser performance for single and multiple document summaries for both English and French. This is despite a non-correlation of the documentweighted ROUGE metric with human judgments, unlike the original ROUGE metric. These findings suggest a theoretical approximation link between the two metrics.

MSejrKu at SemEval-2016 Task 14: Taxonomy Enrichment by Evidence Ranking
Michael Schlichtkrull | Héctor Martínez Alonso
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

Looking hard: Eye tracking for detecting grammaticality of automatically compressed sentences
Sigrid Klerke | Héctor Martínez Alonso | Anders Søgaard
Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015)

Active learning for sense annotation
Héctor Martínez Alonso | Barbara Plank | Anders Johannsen | Anders Søgaard
Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015)

CPH: Sentiment analysis of Figurative Language on Twitter #easypeasy #not
Sarah McGillion | Héctor Martínez Alonso | Barbara Plank
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

Inverted indexing for cross-lingual NLP
Anders Søgaard | Željko Agić | Héctor Martínez Alonso | Barbara Plank | Bernd Bohnet | Anders Johannsen
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Do dependency parsing metrics correlate with human judgments?
Barbara Plank | Héctor Martínez Alonso | Željko Agić | Danijela Merkler | Anders Søgaard
Proceedings of the Nineteenth Conference on Computational Natural Language Learning

Any-language frame-semantic parsing
Anders Johannsen | Héctor Martínez Alonso | Anders Søgaard
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

Learning to parse with IAA-weighted loss
Héctor Martínez Alonso | Barbara Plank | Arne Skjærholt | Anders Søgaard
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Coarse-grained sense annotation of Danish across textual domains
Sussi Olsen | Bolette S. Pedersen | Héctor Martínez Alonso | Anders Johannsen
Proceedings of the workshop on Semantic resources and semantic annotation for Natural Language Processing and the Digital Humanities at NODALIDA 2015

Supersense tagging for Danish
Héctor Martínez Alonso | Anders Johannsen | Sussi Olsen | Sanni Nimb | Nicolai Hartvig Sørensen | Anna Braasch | Anders Søgaard | Bolette Sandford Pedersen
Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015)

Mining for unambiguous instances to adapt part-of-speech taggers to new domains
Dirk Hovy | Barbara Plank | Héctor Martínez Alonso | Anders Søgaard
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Predicting word sense annotation agreement
Héctor Martínez Alonso | Anders Johannsen | Oier Lopez de Lacalle | Eneko Agirre
Proceedings of the First Workshop on Linking Computational Models of Lexical, Sentential and Discourse-level Semantics

Non-canonical language is not harder to annotate than canonical language
Barbara Plank | Héctor Martínez Alonso | Anders Søgaard
Proceedings of the 9th Linguistic Annotation Workshop

2014

Copenhagen-Malmö: Tree Approximations of Semantic Parsing Problems
Natalie Schluter | Anders Søgaard | Jakob Elming | Dirk Hovy | Barbara Plank | Héctor Martínez Alonso | Anders Johanssen | Sigrid Klerke
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

What’s in a p-value in NLP?
Anders Søgaard | Anders Johannsen | Barbara Plank | Dirk Hovy | Hector Martínez Alonso
Proceedings of the Eighteenth Conference on Computational Natural Language Learning

Crowdsourcing as a preprocessing for complex semantic annotation tasks
Héctor Martínez Alonso | Lauren Romeo
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This article outlines a methodology that uses crowdsourcing to reduce the workload of experts for complex semantic tasks. We split turker-annotated datasets into a high-agreement block, which is not modified, and a low-agreement block, which is re-annotated by experts. The resulting annotations have higher observed agreement. We identify different biases in the annotation for both turkers and experts.

More or less supervised supersense tagging of Twitter
Anders Johannsen | Dirk Hovy | Héctor Martínez Alonso | Barbara Plank | Anders Søgaard
Proceedings of the Third Joint Conference on Lexical and Computational Semantics (*SEM 2014)

2013

Annotation of regular polysemy and underspecification
Héctor Martínez Alonso | Bolette Sandford Pedersen | Núria Bel
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Finding Dependency Parsing Limits over a Large Spanish Corpus
Muntsa Padró | Miguel Ballesteros | Héctor Martínez | Bernd Bohnet
Proceedings of the Sixth International Joint Conference on Natural Language Processing

Using Crowdsourcing to get Representations based on Regular Expressions
Anders Søgaard | Hector Martinez | Jakob Elming | Anders Johannsen
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

Class-based Word Sense Induction for dot-type nominals
Lauren Romeo | Héctor Martínez Alonso | Núria Bel
Proceedings of the 6th International Conference on Generative Approaches to the Lexicon (GL2013)

Down-stream effects of tree-to-dependency conversions
Jakob Elming | Anders Johannsen | Sigrid Klerke | Emanuele Lapponi | Hector Martinez Alonso | Anders Søgaard
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2012

A voting scheme to detect semantic underspecification
Héctor Martínez Alonso | Núria Bel | Bolette Sandford Pedersen
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The following work describes a voting system to automatically classify the sense selection of the complex types Location/Organization and Container/Content, which depend on regular polysemy, as described by the Generative Lexicon (Pustejovsky, 1995) . This kind of sense alternations very often presents semantic underspecificacion between its two possible selected senses. This kind of underspecification is not traditionally contemplated in word sense disambiguation systems, as disambiguation systems are still coping with the need of a representation and recognition of underspecification (Pustejovsky, 2009) The data are characterized by the morphosyntactic and lexical enviroment of the headwords and provided as input for a classifier. The baseline decision tree classifier is compared against an eight-member voting scheme obtained from variants of the training data generated by modifications on the class representation and from two different classification algorithms, namely decision trees and k-nearest neighbors. The voting system improves the accuracy for the non-underspecified senses, but the underspecified sense remains difficult to identify

EMNLP@CPH: Is frequency all there is to simplicity?
Anders Johannsen | Héctor Martínez | Sigrid Klerke | Anders Søgaard
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

2011

Shared Task System Description: Frustratingly Hard Compositionality Prediction
Anders Johannsen | Hector Martinez | Christian Rishøj | Anders Søgaard
Proceedings of the Workshop on Distributional Semantics and Compositionality

Identification of sense selection in regular polysemy using shallow features
Héctor Martínez Alonso | Núria Bel | Bolette Sandford Pedersen
Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011)

Co-authors

Željko Agić 4

Benoît Sagot 4

Natalie Schluter 4

Jianpeng Cheng 3

Diarmuid Ó Séaghdha 3

Peter Boothroyd 2

Alexandru Coca 2

Djamé Seddah 2

Nicolai Hartvig Sørensen 2

Bo-Hsiang Tseng 2

Zhenxing Zhang 2

Devang Agrawal 1

Hector Fernandez Alcalde 1

Mohammed Attia 1

Elena Badmaeva 1

Miguel Ballesteros 1

Esha Banerjee 1

Shruti Bhargava 1

Chris Biemann 1

Joachim Bingel 1

Aljoscha Burchardt 1

Marie Candito 1

Silvie Cinková 1

Cagri Coltekin 1

Matthieu Constant 1

Amaury Delamaire 1

Joris Driesen 1

Kira Droganova 1

Federico Flego 1

Tristan Guigue 1

Memduh Gökırmak 1

Jan Hajič jr. 1

Jaroslava Hlaváčová 1

Hiroshi Kanayama 1

Jenna Kanerva 1

Dimitri Kartsaklis 1

Tolga Kayadelen 1

Václava Kettnerová 1

Jesse Kirchner 1

Sookyoung Kwak 1

Tatiana Lando 1

Emanuele Lapponi 1

Saran Lertpradit 1

Oier Lopez de Lacalle 1

Juhani Luotolahti 1

Vivien Macketanz 1

Michael Mandel 1

Christopher D. Manning 1

Ruli Manurung 1

Katrin Marheinecke 1

Sarah McGillion 1

Gustavo Mendonca 1

Danijela Merkler 1

Anna Missilä 1

Anna Nedoluzhko 1

Rattima Nitisaroj 1

Muntsa Padró 1

Dhivya Piraviperumal 1

Martin Potthast 1

Sampo Pyysalo 1

Christian Rishøj 1

Manuela Sanguinetti 1

Michael Schlichtkrull 1

Anne-Kathrin Schumann 1

Sebastian Schuster 1

Atsuko Shimada 1

Arne Skjærholt 1

Antonio Stella 1

Jana Strnadová 1

Umut Sulubacak 1

Francis Tyers 1

Zdenka Uresova 1

Hans Uszkoreit 1

Éric Villemonte de la Clergerie 1

Jason D. Williams 1

Seid Muhie Yimam 1

Marie-Catherine de Marneffe 1

Valeria de Paiva 1

Venues

JEP/TALN/RECITAL1