Gabriel Lopes

Also published as: Gabriel P. Lopes, Jose Gabriel P. Lopes, José Gabriel Pereira Lopes, Gabriel Pereira Lopes, Gabriel Pereira Lopes, Jose Gabriel Lopes, José Gabriel Pereira Lopes, Gabriel P. Lopes

2016

pdf bib abs

First Steps Towards Coverage-Based Sentence Alignment
Luís Gomes | Gabriel Pereira Lopes
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper, we introduce a coverage-based scoring function that discriminates between parallel and non-parallel sentences. When plugged into Bleualign, a state-of-the-art sentence aligner, our function improves both precision and recall of alignments over the originally proposed BLEU score. Furthermore, since our scoring function uses Moses phrase tables directly we avoid the need to translate the texts to be aligned, which is time-consuming and a potential source of alignment errors.

pdf bib

English-Portuguese Biomedical Translation Task Using a Genuine Phrase-Based Statistical Machine Translation Approach
José Aires | Gabriel Lopes | Luís Gomes
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib

First Steps Towards Coverage-Based Document Alignment
Luís Gomes | Gabriel Pereira Lopes
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib abs

Using Bilingual Segments in Generating Word-to-word Translations
Kavitha Mahesh | Gabriel Pereira Lopes | Luís Gomes
Proceedings of the Sixth Workshop on Hybrid Approaches to Translation (HyTra6)

We defend that bilingual lexicons automatically extracted from parallel corpora, whose entries have been meanwhile validated by linguists and classified as correct or incorrect, should constitute a specific parallel corpora. And, in this paper, we propose to use word-to-word translations to learn morph-units (comprising of bilingual stems and suffixes) from those bilingual lexicons for two language pairs L1-L2 and L1-L3 to induce a bilingual lexicon for the language pair L2-L3, apart from also learning morph-units for this other language pair. The applicability of bilingual morph-units in L1-L2 and L1-L3 is examined from the perspective of pivot-based lexicon induction for language pair L2-L3 with L1 as bridge. While the lexicon is derived by transitivity, the correspondences are identified based on previously learnt bilingual stems and suffixes rather than surface translation forms. The induced pairs are validated using a binary classifier trained on morphological and similarity-based features using an existing, automatically acquired, manually validated bilingual translation lexicon for language pair L2-L3. In this paper, we discuss the use of English (EN)-French (FR) and English (EN)-Portuguese (PT) lexicon of word-to-word translations in generating word-to-word translations for the language pair FR-PT with EN as pivot language. Generated translations are filtered out first using an SVM-based FR-PT classifier and then are manually validated.

2014

pdf bib

Context Sense Clustering for Translation
João Casteleiro | Gabriel Lopes | Joaquim Silva
Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation

2010

pdf bib

Towards Automatic Building of Document Keywords
Joaquim Silva | Gabriel Lopes
Coling 2010: Posters

2005

pdf bib

Clustering Syntactic Positions with Similar Semantic Requirements
Pablo Gamallo | Alexandre Agustini | Gabriel P. Lopes
Computational Linguistics, Volume 31, Number 1, March 2005

2004

pdf bib abs

Disambiguation and Optional Co-Composition
Pablo Gamallo | Gabriel P. Lopes | Alexandre Agustini
Actes de la 11ème conférence sur le Traitement Automatique des Langues Naturelles. Posters

This paper describes a specific semantic property underlying binary dependencies: co-composition. We propose a more general definition than that given by Pustejovsky, what we call “optional co-composition”. The aim of the paper is to explore the benefits of optional cocomposition in two disambiguation tasks: both word sense and structural disambiguation. Concerning the second task, some experiments were performed on large corpora.

pdf bib abs

Extracting Named Entities. A Statistical Approach
Joaquim Silva | Zornitsa Kozareva | Veska Noncheva | Gabriel Lopes
Actes de la 11ème conférence sur le Traitement Automatique des Langues Naturelles. Posters

Named entities and more generally Multiword Lexical Units (MWUs) are important for various applications. However, language independent methods for automatically extracting MWUs do not provide us with clean data. So, in this paper we propose a method for selecting possible named entities from automatically extracted MWUs, and later, a statistics-based language independent unsupervised approach is applied to possible named entities in order to cluster them according to their type. Statistical features used by our clustering process are described and motivated. The Model-Based Clustering Analysis (MBCA) software enabled us to obtain different clusters for proposed named entities. The method was applied to Bulgarian and English. For some clusters, precision is very high; other clusters still need further refinement. Based on the obtained clusters, it is also possible to classify new possible named entities.

pdf bib

Cluster Analysis and Classification of Named Entities
Joaquim F. Ferreira da Silva | Zornitsa Kozareva | José Gabriel Pereira Lopes
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

Some authors (Simard et al.; Melamed; Danielsson & Mühlenbock) have suggested measures of similarity of words in different languages so as to find extra clues for alignment of parallel texts. Cognate words, like ‘Parliament’ and ‘Parlement’, in English and French respectively, provide extra anchors that help to improve the quality of the alignment. In this paper, we will extend an alignment algorithm proposed by Ribeiro et al. using typical contiguous and non-contiguous sequences of characters extracted using a statistically sound method (Dias et al.). With these typical sequences, we are able to find more reliable correspondence points and improve the alignment quality without recurring to heuristics to identify cognates.

2000

pdf bib abs

A self-learning method of parallel texts alignment
António Ribeiro | Gabriel Lopes | João Mexia
Proceedings of the Fourth Conference of the Association for Machine Translation in the Americas: Technical Papers

This paper describes a language independent method for alignment of parallel texts that re-uses acquired knowledge. The system extracts word translation equivalents and re-uses them as correspondence points in order to enhance the alignment of parallel texts. Points that may cause misalignment are filtered using confidence bands of linear regression analysis instead of heuristics, which are not theoretically reliable. Homographs bootstrap the alignment process so as to build the primary word translation lexicon. At each step, the previously acquired lexicon is re-used so as to repeatedly make finer-grained alignments and produce more reliable translation lexicons.

pdf bib

Extracting Textual Associations in Part-of-Speech Tagged Corpora
Gaël Dias | Sylvie Guilloré | José Gabriel Pereira Lopes
5th EAMT Workshop: Harvesting Existing Resources

pdf bib

Using Confidence Bands for Parallel Texts Alignment
António Ribeiro | Gabriel Lopes | João Mexia
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics