Fabienne Braune


pdf bib
KnowMAN: Weakly Supervised Multinomial Adversarial Networks
Luisa März | Ehsaneddin Asgari | Fabienne Braune | Franziska Zimmermann | Benjamin Roth
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

The absence of labeled data for training neural models is often addressed by leveraging knowledge about the specific task, resulting in heuristic but noisy labels. The knowledge is captured in labeling functions, which detect certain regularities or patterns in the training samples and annotate corresponding labels for training. This process of weakly supervised training may result in an over-reliance on the signals captured by the labeling functions and hinder models to exploit other signals or to generalize well. We propose KnowMAN, an adversarial scheme that enables to control influence of signals associated with specific labeling functions. KnowMAN forces the network to learn representations that are invariant to those signals and to pick up other signals that are more generally associated with an output label. KnowMAN strongly improves results compared to direct weakly supervised learning with a pre-trained transformer language model and a feature-based baseline.


pdf bib
UniSent: Universal Adaptable Sentiment Lexica for 1000+ Languages
Ehsaneddin Asgari | Fabienne Braune | Benjamin Roth | Christoph Ringlstetter | Mohammad Mofrad
Proceedings of the 12th Language Resources and Evaluation Conference

In this paper, we introduce UniSent universal sentiment lexica for 1000+ languages. Sentiment lexica are vital for sentiment analysis in absence of document-level annotations, a very common scenario for low-resource languages. To the best of our knowledge, UniSent is the largest sentiment resource to date in terms of the number of covered languages, including many low resource ones. In this work, we use a massively parallel Bible corpus to project sentiment information from English to other languages for sentiment analysis on Twitter data. We introduce a method called DomDrift to mitigate the huge domain mismatch between Bible and Twitter by a confidence weighting scheme that uses domain-specific embeddings to compare the nearest neighbors for a candidate sentiment word in the source (Bible) and target (Twitter) domain. We evaluate the quality of UniSent in a subset of languages for which manually created ground truth was available, Macedonian, Czech, German, Spanish, and French. We show that the quality of UniSent is comparable to manually created sentiment resources when it is used as the sentiment seed for the task of word sentiment prediction on top of embedding representations. In addition, we show that emoticon sentiments could be reliably predicted in the Twitter domain using only UniSent and monolingual embeddings in German, Spanish, French, and Italian. With the publication of this paper, we release the UniSent sentiment lexica at http://language-lab.info/unisent.


pdf bib
Two Methods for Domain Adaptation of Bilingual Tasks: Delightfully Simple and Broadly Applicable
Viktor Hangya | Fabienne Braune | Alexander Fraser | Hinrich Schütze
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Bilingual tasks, such as bilingual lexicon induction and cross-lingual classification, are crucial for overcoming data sparsity in the target language. Resources required for such tasks are often out-of-domain, thus domain adaptation is an important problem here. We make two contributions. First, we test a delightfully simple method for domain adaptation of bilingual word embeddings. We evaluate these embeddings on two bilingual tasks involving different domains: cross-lingual twitter sentiment classification and medical bilingual lexicon induction. Second, we tailor a broadly applicable semi-supervised classification method from computer vision to these tasks. We show that this method also helps in low-resource setups. Using both methods together we achieve large improvements over our baselines, by using only additional unlabeled data.

pdf bib
Unsupervised Parallel Sentence Extraction from Comparable Corpora
Viktor Hangya | Fabienne Braune | Yuliya Kalasouskaya | Alexander Fraser
Proceedings of the 15th International Conference on Spoken Language Translation

Mining parallel sentences from comparable corpora is of great interest for many downstream tasks. In the BUCC 2017 shared task, systems performed well by training on gold standard parallel sentences. However, we often want to mine parallel sentences without bilingual supervision. We present a simple approach relying on bilingual word embeddings trained in an unsupervised fashion. We incorporate orthographic similarity in order to handle words with similar surface forms. In addition, we propose a dynamic threshold method to decide if a candidate sentence-pair is parallel which eliminates the need to fine tune a static value for different datasets. Since we do not employ any language specific engineering our approach is highly generic. We show that our approach is effective, on three language-pairs, without the use of any bilingual signal which is important because parallel sentence mining is most useful in low resource scenarios.

pdf bib
Evaluating bilingual word embeddings on the long tail
Fabienne Braune | Viktor Hangya | Tobias Eder | Alexander Fraser
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

Bilingual word embeddings are useful for bilingual lexicon induction, the task of mining translations of given words. Many studies have shown that bilingual word embeddings perform well for bilingual lexicon induction but they focused on frequent words in general domains. For many applications, bilingual lexicon induction of rare and domain-specific words is of critical importance. Therefore, we design a new task to evaluate bilingual word embeddings on rare words in different domains. We show that state-of-the-art approaches fail on this task and present simple new techniques to improve bilingual word embeddings for mining rare words. We release new gold standard datasets and code to stimulate research on this task.


pdf bib
LMU Munich’s Neural Machine Translation Systems for News Articles and Health Information Texts
Matthias Huck | Fabienne Braune | Alexander Fraser
Proceedings of the Second Conference on Machine Translation


pdf bib
A Framework for Discriminative Rule Selection in Hierarchical Moses
Fabienne Braune | Alexander Fraser | Hal Daumé III | Aleš Tamchyna
Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers

pdf bib
The QT21/HimL Combined Machine Translation System
Jan-Thorsten Peter | Tamer Alkhouli | Hermann Ney | Matthias Huck | Fabienne Braune | Alexander Fraser | Aleš Tamchyna | Ondřej Bojar | Barry Haddow | Rico Sennrich | Frédéric Blain | Lucia Specia | Jan Niehues | Alex Waibel | Alexandre Allauzen | Lauriane Aufrant | Franck Burlot | Elena Knyazeva | Thomas Lavergne | François Yvon | Mārcis Pinnis | Stella Frank
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers


pdf bib
Rule Selection with Soft Syntactic Features for String-to-Tree Statistical Machine Translation
Fabienne Braune | Nina Seemann | Alexander Fraser
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
String-to-Tree Multi Bottom-up Tree Transducers
Nina Seemann | Fabienne Braune | Andreas Maletti
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
A systematic evaluation of MBOT in statistical machine translation
Nina Seemann | Fabienne Braune | Andreas Maletti
Proceedings of Machine Translation Summit XV: Papers


pdf bib
Mapping Between English Strings and Reentrant Semantic Graphs
Fabienne Braune | Daniel Bauer | Kevin Knight
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We investigate formalisms for capturing the relation between semantic graphs and English strings. Semantic graph corpora have spurred recent interest in graph transduction formalisms, but it is not yet clear whether such formalisms are a good fit for natural language data―in particular, for describing how semantic reentrancies correspond to English pronouns, zero pronouns, reflexives, passives, nominalizations, etc. We introduce a data set that focuses on these problems, we build grammars to capture the graph/string relation in this data, and we evaluate those grammars for conciseness and accuracy.


pdf bib
Shallow Local Multi-Bottom-up Tree Transducers in Statistical Machine Translation
Fabienne Braune | Nina Seemann | Daniel Quernheim | Andreas Maletti
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)


pdf bib
Composing extended top-down tree transducers
Aurélie Lagoutte | Fabienne Braune | Daniel Quernheim | Andreas Maletti
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Preservation of Recognizability for Weighted Linear Extended Top-Down Tree Transducers
Nina Seemann | Daniel Quernheim | Fabienne Braune | Andreas Maletti
Proceedings of the Workshop on Applications of Tree Automata Techniques in Natural Language Processing

pdf bib
Long-distance reordering during search for hierarchical phrase-based SMT
Fabienne Braune | Anita Gojun | Alexander Fraser
Proceedings of the 16th Annual conference of the European Association for Machine Translation


pdf bib
Improved Unsupervised Sentence Alignment for Symmetrical and Asymmetrical Parallel Corpora
Fabienne Braune | Alexander Fraser
Coling 2010: Posters