José G. C. de Souza

Also published as: Jose G.C. de Souza, José G. C. de Souza, José G. Camargo de Souza, José G.C. de Souza, José Guilherme C. de Souza, José Guilherme Camargo de Souza


2018

pdf bib
Quality Estimation for Automatically Generated Titles of eCommerce Browse Pages
Nicola Ueffing | José G. C. de Souza | Gregor Leusch
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)

At eBay, we are automatically generating a large amount of natural language titles for eCommerce browse pages using machine translation (MT) technology. While automatic approaches can generate millions of titles very fast, they are prone to errors. We therefore develop quality estimation (QE) methods which can automatically detect titles with low quality in order to prevent them from going live. In this paper, we present different approaches: The first one is a Random Forest (RF) model that explores hand-crafted, robust features, which are a mix of established features commonly used in Machine Translation Quality Estimation (MTQE) and new features developed specifically for our task. The second model is based on Siamese Networks (SNs) which embed the metadata input sequence and the generated title in the same space and do not require hand-crafted features at all. We thoroughly evaluate and compare those approaches on in-house data. While the RF models are competitive for scenarios with smaller amounts of training data and somewhat more robust, they are clearly outperformed by the SN models when the amount of training data is larger.

pdf bib
Generating E-Commerce Product Titles and Predicting their Quality
José G. Camargo de Souza | Michael Kozielski | Prashant Mathur | Ernie Chang | Marco Guerini | Matteo Negri | Marco Turchi | Evgeny Matusov
Proceedings of the 11th International Conference on Natural Language Generation

E-commerce platforms present products using titles that summarize product information. These titles cannot be created by hand, therefore an algorithmic solution is required. The task of automatically generating these titles given noisy user provided titles is one way to achieve the goal. The setting requires the generation process to be fast and the generated title to be both human-readable and concise. Furthermore, we need to understand if such generated titles are usable. As such, we propose approaches that (i) automatically generate product titles, (ii) predict their quality. Our approach scales to millions of products and both automatic and human evaluations performed on real-world data indicate our approaches are effective and applicable to existing e-commerce scenarios.

2016

pdf bib
TranscRater: a Tool for Automatic Speech Recognition Quality Estimation
Shahab Jalalvand | Matteo Negri | Marco Turchi | José G. C. de Souza | Daniele Falavigna | Mohammed R. H. Qwaider
Proceedings of ACL-2016 System Demonstrations

pdf bib
TMop: a Tool for Unsupervised Translation Memory Cleaning
Masoud Jalili Sabet | Matteo Negri | Marco Turchi | José G. C. de Souza | Marcello Federico
Proceedings of ACL-2016 System Demonstrations

pdf bib
The FBK Participation in the WMT 2016 Automatic Post-editing Shared Task
Rajen Chatterjee | José G. C. de Souza | Matteo Negri | Marco Turchi
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
FBK HLT-MT at SemEval-2016 Task 1: Cross-lingual Semantic Similarity Measurement Using Quality Estimation Features and Compositional Bilingual Word Embeddings
Duygu Ataman | José G. C. de Souza | Marco Turchi | Matteo Negri
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

pdf bib
Online Multitask Learning for Machine Translation Quality Estimation
José G. C. de Souza | Matteo Negri | Elisa Ricci | Marco Turchi
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Multitask Learning for Adaptive Quality Estimation of Automatically Transcribed Utterances
José G. C. de Souza | Hamed Zamani | Matteo Negri | Marco Turchi | Daniele Falavigna
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
MT quality estimation for e-commerce data
José G. C. de Souza | Marcello Federico | Hassan Sawaf
Proceedings of Machine Translation Summit XV: User Track

2014

pdf bib
Online multi-user adaptive statistical machine translation
Prashant Mathur | Mauro Cettolo | Marcello Federico | José G.C. de Souza
Proceedings of the 11th Conference of the Association for Machine Translation in the Americas: MT Researchers Track

In this paper we investigate the problem of adapting a machine translation system to the feedback provided by multiple post-editors. It is well know that translators might have very different post-editing styles and that this variability hinders the application of online learning methods, which indeed assume a homogeneous source of adaptation data. We hence propose multi-task learning to leverage bias information from each single post-editors in order to constrain the evolution of the SMT system. A new framework for significance testing with sentence level metrics is described which shows that Multi-Task learning approaches outperforms existing online learning approaches, with significant gains of 1.24 and 1.88 TER score over a strong online adaptive baseline, on a test set of post-edits produced by four translators texts and on a popular benchmark with multiple references, respectively.

pdf bib
Towards a combination of online and multitask learning for MT quality estimation: a preliminary study
José G.C. de Souza | Marco Turchi | Matteo Negri
Workshop on interactive and adaptive machine translation

Quality estimation (QE) for machine translation has emerged as a promising way to provide real-world applications with methods to estimate at run-time the reliability of automatic translations. Real-world applications, however, pose challenges that go beyond those of current QE evaluation settings. For instance, the heterogeneity and the scarce availability of training data might contribute to significantly raise the bar. To address these issues we compare two alternative machine learning paradigms, namely online and multi-task learning, measuring their capability to overcome the limitations of current batch methods. The results of our experiments, which are carried out in the same experimental setting, demonstrate the effectiveness of the two methods and suggest their complementarity. This indicates, as a promising research avenue, the possibility to combine their strengths into an online multi-task approach to the problem.

pdf bib
FBK-UPV-UEdin participation in the WMT14 Quality Estimation shared-task
José Guilherme Camargo de Souza | Jesús González-Rubio | Christian Buck | Marco Turchi | Matteo Negri
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
Machine Translation Quality Estimation Across Domains
José G. C. de Souza | Marco Turchi | Matteo Negri
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Quality Estimation for Automatic Speech Recognition
Matteo Negri | Marco Turchi | José G. C. de Souza | Daniele Falavigna
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Adaptive Quality Estimation for Machine Translation
Marco Turchi | Antonios Anastasopoulos | José G. C. de Souza | Matteo Negri
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2013

pdf bib
FBK-UEdin Participation to the WMT13 Quality Estimation Shared Task
José Guilherme Camargo de Souza | Christian Buck | Marco Turchi | Matteo Negri
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf bib
Exploiting Qualitative Information from Automatic Word Alignment for Cross-lingual NLP Tasks
José G.C. de Souza | Miquel Esplà-Gomis | Marco Turchi | Matteo Negri
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
QuEst - A translation quality estimation framework
Lucia Specia | Kashif Shah | Jose G.C. de Souza | Trevor Cohn
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations

2012

pdf bib
FBK: Machine Translation Evaluation and Word Similarity metrics for Semantic Textual Similarity
José Guilherme Camargo de Souza | Matteo Negri | Yashar Mehdad
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib
FBK: Cross-Lingual Textual Entailment Without Translation
Yashar Mehdad | Matteo Negri | José Guilherme C. de Souza
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)