John Judge


2023

pdf bib
FINDINGS OF THE IWSLT 2023 EVALUATION CAMPAIGN
Milind Agarwal | Sweta Agrawal | Antonios Anastasopoulos | Luisa Bentivogli | Ondřej Bojar | Claudia Borg | Marine Carpuat | Roldano Cattoni | Mauro Cettolo | Mingda Chen | William Chen | Khalid Choukri | Alexandra Chronopoulou | Anna Currey | Thierry Declerck | Qianqian Dong | Kevin Duh | Yannick Estève | Marcello Federico | Souhir Gahbiche | Barry Haddow | Benjamin Hsu | Phu Mon Htut | Hirofumi Inaguma | Dávid Javorský | John Judge | Yasumasa Kano | Tom Ko | Rishu Kumar | Pengwei Li | Xutai Ma | Prashant Mathur | Evgeny Matusov | Paul McNamee | John P. McCrae | Kenton Murray | Maria Nadejde | Satoshi Nakamura | Matteo Negri | Ha Nguyen | Jan Niehues | Xing Niu | Atul Kr. Ojha | John E. Ortega | Proyag Pal | Juan Pino | Lonneke van der Plas | Peter Polák | Elijah Rippeth | Elizabeth Salesky | Jiatong Shi | Matthias Sperber | Sebastian Stüker | Katsuhito Sudoh | Yun Tang | Brian Thompson | Kevin Tran | Marco Turchi | Alex Waibel | Mingxuan Wang | Shinji Watanabe | Rodolfo Zevallos
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

This paper reports on the shared tasks organized by the 20th IWSLT Conference. The shared tasks address 9 scientific challenges in spoken language translation: simultaneous and offline translation, automatic subtitling and dubbing, speech-to-speech translation, multilingual, dialect and low-resource speech translation, and formality control. The shared tasks attracted a total of 38 submissions by 31 teams. The growing interest towards spoken language translation is also witnessed by the constantly increasing number of shared task organizers and contributors to the overview paper, almost evenly distributed across industry and academia.

pdf bib
The 2023 WebNLG Shared Task on Low Resource Languages. Overview and Evaluation Results (WebNLG 2023)
Liam Cripwell | Anya Belz | Claire Gardent | Albert Gatt | Claudia Borg | Marthese Borg | John Judge | Michela Lorandi | Anna Nikiforovskaya | William Soto Martinez
Proceedings of the Workshop on Multimodal, Multilingual Natural Language Generation and Multilingual WebNLG Challenge (MM-NLG 2023)

The WebNLG task consists of mapping a knowledge graph to a text verbalising the con- tent of that graph. The 2017 WebNLG edi- tion required participating systems to gener- ate English text from a set of DBpedia triples, while the 2020 WebNLG+ challenge addition- ally included generation into Russian and se- mantic parsing of English and Russian texts. In contrast, WebNLG 2023 focuses on four under-resourced languages which are severely under-represented in research on text genera- tion, namely Breton, Irish, Maltese and Welsh. In addition, WebNLG 2023 once again includes Russian. In this paper, we present the organi- sation of the shared task (data, timeline, eval- uation), briefly describe the participating sys- tems and summarise results for participating systems.

2022

pdf bib
Mitigating Covertly Unsafe Text within Natural Language Systems
Alex Mei | Anisha Kabir | Sharon Levy | Melanie Subbiah | Emily Allaway | John Judge | Desmond Patton | Bruce Bimber | Kathleen McKeown | William Yang Wang
Findings of the Association for Computational Linguistics: EMNLP 2022

An increasingly prevalent problem for intelligent technologies is text safety, as uncontrolled systems may generate recommendations to their users that lead to injury or life-threatening consequences. However, the degree of explicitness of a generated statement that can cause physical harm varies. In this paper, we distinguish types of text that can lead to physical harm and establish one particularly underexplored category: covertly unsafe text. Then, we further break down this category with respect to the system’s information and discuss solutions to mitigate the generation of text in each of these subcategories. Ultimately, our work defines the problem of covertly unsafe language that causes physical harm and argues that this subtle yet dangerous issue needs to be prioritized by stakeholders and regulators. We highlight mitigation strategies to inspire future researchers to tackle this challenging problem and help improve safety within smart systems.

pdf bib
Learning to Prioritize: Precision-Driven Sentence Filtering for Long Text Summarization
Alex Mei | Anisha Kabir | Rukmini Bapat | John Judge | Tony Sun | William Yang Wang
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Neural text summarization has shown great potential in recent years. However, current state-of-the-art summarization models are limited by their maximum input length, posing a challenge to summarizing longer texts comprehensively. As part of a layered summarization architecture, we introduce PureText, a simple yet effective pre-processing layer that removes low- quality sentences in articles to improve existing summarization models. When evaluated on popular datasets like WikiHow and Reddit TIFU, we show up to 3.84 and 8.57 point ROUGE-1 absolute improvement on the full test set and the long article subset, respectively, for state-of-the-art summarization models such as BertSum and BART. Our approach provides downstream models with higher-quality sentences for summarization, improving overall model performance, especially on long text articles.

2015

pdf bib
Tapadóir
Eimear Maguire | John Judge | Teresa Lynn
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Tapadóir
Eimear Maguire | John Judge | Teresa Lynn
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

2014

pdf bib
The Strategic Impact of META-NET on the Regional, National and International Level
Georg Rehm | Hans Uszkoreit | Sophia Ananiadou | Núria Bel | Audronė Bielevičienė | Lars Borin | António Branco | Gerhard Budin | Nicoletta Calzolari | Walter Daelemans | Radovan Garabík | Marko Grobelnik | Carmen García-Mateo | Josef van Genabith | Jan Hajič | Inma Hernáez | John Judge | Svetla Koeva | Simon Krek | Cvetana Krstev | Krister Lindén | Bernardo Magnini | Joseph Mariani | John McNaught | Maite Melero | Monica Monachini | Asunción Moreno | Jan Odijk | Maciej Ogrodniczuk | Piotr Pęzik | Stelios Piperidis | Adam Przepiórkowski | Eiríkur Rögnvaldsson | Michael Rosner | Bolette Pedersen | Inguna Skadiņa | Koenraad De Smedt | Marko Tadić | Paul Thompson | Dan Tufiş | Tamás Váradi | Andrejs Vasiļjevs | Kadri Vider | Jolanta Zabarskaite
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This article provides an overview of the dissemination work carried out in META-NET from 2010 until early 2014; we describe its impact on the regional, national and international level, mainly with regard to politics and the situation of funding for LT topics. This paper documents the initiative’s work throughout Europe in order to boost progress and innovation in our field.

pdf bib
Proceedings of the First Celtic Language Technology Workshop
John Judge | Teresa Lynn | Monica Ward | Brian Ó Raghallaigh
Proceedings of the First Celtic Language Technology Workshop

pdf bib
Active Learning for Post-Editing Based Incrementally Retrained MT
Aswarth Abhilash Dara | Josef van Genabith | Qun Liu | John Judge | Antonio Toral
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers

2008

pdf bib
Linguistically Light Lexical Extensions for Ontologies
Brian Davis | Siegfried Handschuh | Alexander Troussov | John Judge | Mikhail Sogrin
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The identification of class instances within unstructured text for either the purposes of Ontology population or semantic annotation are usually limited to term mentions of Proper Noun and Personal Noun or fixed Key Phrases within Text Analytics or Ontology based Information Extraction(OBIE) applications. These systems do not generalize to cope with compound nominal classes of multi word expressions. Computational Linguistics’ approaches involving deep analysis tend to suffer from idiomaticity and overgeneration problems while the shallower “words with spaces” approach frequently employed in Information Extraction(IE) and Industrial Text Analytics systems lacks flexibility and is prone to lexical proliferation. We outline a representation for encoding light linguistic features of Compound Nominal term mentions of Concepts within an Ontology as well as a lightweight semantic annotator which complies the above linguistic information into efficient Dictionary formats to drive large scale identification and semantic annotation of the aforementioned concepts.

2006

pdf bib
QuestionBank: Creating a Corpus of Parse-Annotated Questions
John Judge | Aoife Cahill | Josef van Genabith
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

Search
Co-authors