João Graça

Also published as: Joao Graca, João V. Graça

2020

pdf bib abs
Project MAIA: Multilingual AI Agent Assistant
André F. T. Martins | Joao Graca | Paulo Dimas | Helena Moniz | Graham Neubig
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

This paper presents the Multilingual Artificial Intelligence Agent Assistant (MAIA), a project led by Unbabel with the collaboration of CMU, INESC-ID and IT Lisbon. MAIA will employ cutting-edge machine learning and natural language processing technologies to build multilingual AI agent assistants, eliminating language barriers. MAIA’s translation layer will empower human agents to provide customer support in real-time, in any language, with human quality.

2018

pdf bib
Proceedings of the AMTA 2018 Workshop on Translation Quality Estimation and Automatic Post-Editing
Ramón Astudillo | João Graça | André Martins
Proceedings of the AMTA 2018 Workshop on Translation Quality Estimation and Automatic Post-Editing

pdf bib
Unbabel: How to combine AI with the crowd to scale professional-quality translation
João Graça
Proceedings of the AMTA 2018 Workshop on Translation Quality Estimation and Automatic Post-Editing

2017

2012

With the CINTIL-International Corpus of Portuguese, an ongoing corpus annotated with fully flegded grammatical representation, sentences get not only a high level of lexical, morphological and syntactic annotation but also a semantic analysis that prepares the data to a manual specification step and thus opens the way for a number of tools and resources for which there is a great research focus at the present. This paper reports on the construction of a propbank that builds on CINTIL-DeepGramBank, with nearly 10 thousand sentences, on the basis of a deep linguistic grammar and on the process and the linguistic criteria guiding that construction, which makes possible to obtain a complete PropBank with both syntactic and semantic levels of linguistic annotation. Taking into account this and the promising scores presented in this study for inter-annotator agreement, CINTIL-PropBank presents itself as a great resource to train a semantic role labeller, one of our goals with this project.

pdf bib
Entropy-based Pruning for Phrase-based Machine Translation
Wang Ling | João Graça | Isabel Trancoso | Alan Black
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
Wiki-ly Supervised Part-of-Speech Tagging
Shen Li | João Graça | Ben Taskar
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure
Trevor Cohn | Phil Blunsom | Joao Graca
Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure

pdf bib
The PASCAL Challenge on Grammar Induction
Douwe Gelling | Trevor Cohn | Phil Blunsom | João Graça
Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure

2011

pdf bib
Discriminative Phrase-based Lexicalized Reordering Models using Weighted Reordering Graphs
Wang Ling | João Graça | David Martins de Matos | Isabel Trancoso | Alan W Black
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Universal Morphological Analysis using Structured Nearest Neighbor Prediction
Young-Bum Kim | João Graça | Benjamin Snyder
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Reordering Modeling using Weighted Alignment Matrices
Wang Ling | Tiago Luís | João Graça | Isabel Trancoso | Luísa Coheur
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Rich Prior Knowledge in Learning for Natural Language Processing
Gregory Druck | Kuzman Ganchev | João Graça
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

2010

pdf bib
Learning Tractable Word Alignment Models with Complex Constraints
João V. Graça | Kuzman Ganchev | Ben Taskar
Computational Linguistics, Volume 36, Issue 3 - September 2010

pdf bib abs
The INESC-ID machine translation system for the IWSLT 2010
Wang Ling | Tiago Luís | João Graça | Luísa Coheur | Isabel Trancoso
Proceedings of the 7th International Workshop on Spoken Language Translation: Evaluation Campaign

In this paper we describe the Instituto de Engenharia de Sistemas e Computadores Investigac ̧a ̃o e Desenvolvimento (INESC-ID) system that participated in the IWSLT 2010 evaluation campaign. Our main goal for this evaluation was to employ several state-of-the-art methods applied to phrase-based machine translation in order to improve the translation quality. Aside from the IBM M4 alignment model, two constrained alignment models were tested, which produced better overall results. These results were further improved by using weighted alignment matrixes during phrase extraction, rather than the single best alignment. Finally, we tested several filters that ruled out phrase pairs based on puntuation. Our system was evaluated on the BTEC and DIALOG tasks, having achieved a better overall ranking in the DIALOG task.

pdf bib abs
Towards a general and extensible phrase-extraction algorithm
Wang Ling | Tiago Luís | João Graça | Luísa Coheur | Isabel Trancoso
Proceedings of the 7th International Workshop on Spoken Language Translation: Papers

Phrase-based systems deeply depend on the quality of their phrase tables and therefore, the process of phrase extraction is always a fundamental step. In this paper we present a general and extensible phrase extraction algorithm, where we have highlighted several control points. The instantiation of these control points allows the simulation of previous approaches, as in each one of these points different strategies/heuristics can be tested. We show how previous approaches fit in this algorithm, compare several of them and, in addition, we propose alternative heuristics, showing their impact on the final translation results. Considering two different test scenarios from the IWSLT 2010 competition (BTEC, Fr-En and DIALOG, Cn-En), we have obtained an improvement in the results of 2.4 and 2.8 BLEU points, respectively.

Corpora of sentences annotated with grammatical information have been deployed by extending the basic lexical and morphological data with increasingly complex information, such as phrase constituency, syntactic functions, semantic roles, etc. As these corpora grow in size and the linguistic information to be encoded reaches higher levels of sophistication, the utilization of annotation tools and, above all, supporting computational grammars appear no longer as a matter of convenience but of necessity. In this paper, we report on the design features, the development conditions and the methodological options of a deep linguistic databank, the CINTIL DeepGramBank. In this corpus, sentences are annotated with fully fledged linguistically informed grammatical representations that are produced by a deep linguistic processing grammar, thus consistently integrating morphological, syntactic and semantic information. We also report on how such corpus permits to straightforwardly obtain a whole range of past generation annotated corpora (POS, NER and morphology), current generation treebanks (constituency treebanks, dependency banks, propbanks) and next generation databanks (logical form banks) simply by means of a very residual selection/extraction effort to get the appropriate ""views"" exposing the relevant layers of information.

pdf bib
Sparsity in Dependency Grammar Induction
Jennifer Gillenwater | Kuzman Ganchev | João Graça | Fernando Pereira | Ben Taskar
Proceedings of the ACL 2010 Conference Short Papers

2008

pdf bib
Better Alignments = Better Translations?
Kuzman Ganchev | João V. Graça | Ben Taskar
Proceedings of ACL-08: HLT

pdf bib abs
Building a Golden Collection of Parallel Multi-Language Word Alignment
João Graça | Joana Paulo Pardal | Luísa Coheur | Diamantino Caseiro
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper reports an experience on producing manual word alignments over six different language pairs (all combinations between Portuguese, English, French and Spanish) (Graça et al., 2008). Word alignment of each language pair is made over the first 100 sentences of the common test set from the Europarl corpora (Koehn, 2005), corresponding to 600 new annotated sentences. This collection is publicly available at http://www.l2f.inesc- id.pt/resources/translation/. It contains, to our knowledge, the first word alignment gold set for the Portuguese language, with three other languages. Besides, it is to our knowledge, the first multi-language manual word aligned parallel corpus, where the same sentences are annotated for each language pair. We started by using the guidelines presented at (Mariño, 2005) and performed several refinements: some due to under-specifications on the original guidelines, others because of disagreement on some choices. This lead to the development of an extensive new set of guidelines for multi-lingual word alignment annotation that, we believe, makes the alignment process less ambiguous. We evaluate the inter-annotator agreement obtaining an average of 91.6% agreement between the different language pairs.

2007

pdf bib abs
The INESC-ID IWSLT07 SMT system
João V. Graça | Diamantino Caseiro | Luísa Coheur
Proceedings of the Fourth International Workshop on Spoken Language Translation

We present the machine translation system used by L2F from INESC-ID in the evaluation campaign of the International Workshop on Spoken Language Translation (2007), in the task of translating spontaneous conversations in the travel domain from Italian to English.

pdf bib
Frustratingly Hard Domain Adaptation for Dependency Parsing
Mark Dredze | John Blitzer | Partha Pratim Talukdar | Kuzman Ganchev | João Graça | Fernando Pereira
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)