2016
pdf
bib
English-Portuguese Biomedical Translation Task Using a Genuine Phrase-Based Statistical Machine Translation Approach
José Aires
|
Gabriel Lopes
|
Luís Gomes
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers
pdf
bib
First Steps Towards Coverage-Based Document Alignment
Luís Gomes
|
Gabriel Pereira Lopes
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers
pdf
bib
abs
Using Bilingual Segments in Generating Word-to-word Translations
Kavitha Mahesh
|
Gabriel Pereira Lopes
|
Luís Gomes
Proceedings of the Sixth Workshop on Hybrid Approaches to Translation (HyTra6)
We defend that bilingual lexicons automatically extracted from parallel corpora, whose entries have been meanwhile validated by linguists and classified as correct or incorrect, should constitute a specific parallel corpora. And, in this paper, we propose to use word-to-word translations to learn morph-units (comprising of bilingual stems and suffixes) from those bilingual lexicons for two language pairs L1-L2 and L1-L3 to induce a bilingual lexicon for the language pair L2-L3, apart from also learning morph-units for this other language pair. The applicability of bilingual morph-units in L1-L2 and L1-L3 is examined from the perspective of pivot-based lexicon induction for language pair L2-L3 with L1 as bridge. While the lexicon is derived by transitivity, the correspondences are identified based on previously learnt bilingual stems and suffixes rather than surface translation forms. The induced pairs are validated using a binary classifier trained on morphological and similarity-based features using an existing, automatically acquired, manually validated bilingual translation lexicon for language pair L2-L3. In this paper, we discuss the use of English (EN)-French (FR) and English (EN)-Portuguese (PT) lexicon of word-to-word translations in generating word-to-word translations for the language pair FR-PT with EN as pivot language. Generated translations are filtered out first using an SVM-based FR-PT classifier and then are manually validated.
pdf
bib
abs
First Steps Towards Coverage-Based Sentence Alignment
Luís Gomes
|
Gabriel Pereira Lopes
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
In this paper, we introduce a coverage-based scoring function that discriminates between parallel and non-parallel sentences. When plugged into Bleualign, a state-of-the-art sentence aligner, our function improves both precision and recall of alignments over the originally proposed BLEU score. Furthermore, since our scoring function uses Moses phrase tables directly we avoid the need to translate the texts to be aligned, which is time-consuming and a potential source of alignment errors.
2014
pdf
bib
Context Sense Clustering for Translation
João Casteleiro
|
Gabriel Lopes
|
Joaquim Silva
Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation
2010
pdf
bib
Towards Automatic Building of Document Keywords
Joaquim Silva
|
Gabriel Lopes
Coling 2010: Posters
2005
pdf
bib
Clustering Syntactic Positions with Similar Semantic Requirements
Pablo Gamallo
|
Alexandre Agustini
|
Gabriel P. Lopes
Computational Linguistics, Volume 31, Number 1, March 2005
2004
pdf
bib
abs
Extracting Named Entities. A Statistical Approach
Joaquim Silva
|
Zornitsa Kozareva
|
Veska Noncheva
|
Gabriel Lopes
Actes de la 11ème conférence sur le Traitement Automatique des Langues Naturelles. Posters
Named entities and more generally Multiword Lexical Units (MWUs) are important for various applications. However, language independent methods for automatically extracting MWUs do not provide us with clean data. So, in this paper we propose a method for selecting possible named entities from automatically extracted MWUs, and later, a statistics-based language independent unsupervised approach is applied to possible named entities in order to cluster them according to their type. Statistical features used by our clustering process are described and motivated. The Model-Based Clustering Analysis (MBCA) software enabled us to obtain different clusters for proposed named entities. The method was applied to Bulgarian and English. For some clusters, precision is very high; other clusters still need further refinement. Based on the obtained clusters, it is also possible to classify new possible named entities.
pdf
bib
Cluster Analysis and Classification of Named Entities
Joaquim F. Ferreira da Silva
|
Zornitsa Kozareva
|
José Gabriel Pereira Lopes
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
2003
pdf
bib
Automatic Acquisition of Word Interaction Patterns from Corpora
Veska Noncheva
|
Joaqium Ferreira da Silva
|
Gabriel Lopes
Proceedings of the 2003 EACL Workshop on Language Modeling for Text Entry Methods
2002
pdf
bib
Using Co-Composition for Acquiring Syntactic and Semantic Subcategorisation
Pablo Gamallo
|
Alexandre Agustini
|
Gabriel P. Lopes
Proceedings of the ACL-02 Workshop on Unsupervised Lexical Acquisition
2001
pdf
bib
abs
Cognates alignment
António Ribeiro
|
Gaël Dias
|
Gabriel Lopes
|
João Mexia
Proceedings of Machine Translation Summit VIII
Some authors (Simard et al.; Melamed; Danielsson & Mühlenbock) have suggested measures of similarity of words in different languages so as to find extra clues for alignment of parallel texts. Cognate words, like ‘Parliament’ and ‘Parlement’, in English and French respectively, provide extra anchors that help to improve the quality of the alignment. In this paper, we will extend an alignment algorithm proposed by Ribeiro et al. using typical contiguous and non-contiguous sequences of characters extracted using a statistically sound method (Dias et al.). With these typical sequences, we are able to find more reliable correspondence points and improve the alignment quality without recurring to heuristics to identify cognates.
2000
pdf
bib
abs
A self-learning method of parallel texts alignment
António Ribeiro
|
Gabriel Lopes
|
João Mexia
Proceedings of the Fourth Conference of the Association for Machine Translation in the Americas: Technical Papers
This paper describes a language independent method for alignment of parallel texts that re-uses acquired knowledge. The system extracts word translation equivalents and re-uses them as correspondence points in order to enhance the alignment of parallel texts. Points that may cause misalignment are filtered using confidence bands of linear regression analysis instead of heuristics, which are not theoretically reliable. Homographs bootstrap the alignment process so as to build the primary word translation lexicon. At each step, the previously acquired lexicon is re-used so as to repeatedly make finer-grained alignments and produce more reliable translation lexicons.
pdf
bib
Using Confidence Bands for Parallel Texts Alignment
António Ribeiro
|
Gabriel Lopes
|
João Mexia
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics
1997
pdf
bib
Statistical methods for retrieving most significant paragraphs in newspaper articles
Jose Abracos
|
Gabriel Pereira Lopes
Intelligent Scalable Text Summarization
1994
pdf
bib
Extending DRT With a Focusing Mechanism for Pronominal Anaphora and Ellipsis Resolution
Jose Abracos
|
Jose Gabriel Lopes
COLING 1994 Volume 2: The 15th International Conference on Computational Linguistics
1992
pdf
bib
Temporal Structure of Discourse
Irene Pimenta Rodrigues
|
Jose Gabriel P. Lopes
COLING 1992 Volume 1: The 14th International Conference on Computational Linguistics
1984
pdf
bib
Transforming English Interfaces to Other Natural Languages: An Experiment With Portuguese
Gabriel Pereira Lopes
10th International Conference on Computational Linguistics and 22nd Annual Meeting of the Association for Computational Linguistics