2022
pdf
bib
abs
Adaptive Differential Privacy for Language Model Training
Xinwei Wu
|
Li Gong
|
Deyi Xiong
Proceedings of the First Workshop on Federated Learning for Natural Language Processing (FL4NLP 2022)
Although differential privacy (DP) can protect language models from leaking privacy, its indiscriminative protection on all data points reduces its practical utility. Previous works improve DP training by discriminating privacy and non-privacy data. But these works rely on datasets with prior privacy information, which is not available in real-world scenarios. In this paper, we propose an Adaptive Differential Privacy (ADP) framework for language modeling without resorting to prior privacy information. We estimate the probability that a linguistic item contains privacy based on a language model. We further propose a new Adam algorithm that adjusts the degree of differential privacy noise injected to the language model according to the estimated privacy probabilities. Experiments demonstrate that our ADP improves differentially private language modeling to achieve good protection from canary attackers.
2019
pdf
bib
abs
Enhanced Transformer Model for Data-to-Text Generation
Li Gong
|
Josep Crego
|
Jean Senellart
Proceedings of the 3rd Workshop on Neural Generation and Translation
Neural models have recently shown significant progress on data-to-text generation tasks in which descriptive texts are generated conditioned on database records. In this work, we present a new Transformer-based data-to-text generation model which learns content selection and summary generation in an end-to-end fashion. We introduce two extensions to the baseline transformer model: First, we modify the latent representation of the input, which helps to significantly improve the content correctness of the output summary; Second, we include an additional learning objective that accounts for content selection modelling. In addition, we propose two data augmentation methods that succeed to further improve performance of the resulting generation models. Evaluation experiments show that our final model outperforms current state-of-the-art systems as measured by different metrics: BLEU, content selection precision and content ordering. We made publicly available the transformer extension presented in this paper.
pdf
bib
abs
SYSTRAN @ WNGT 2019: DGT Task
Li Gong
|
Josep Crego
|
Jean Senellart
Proceedings of the 3rd Workshop on Neural Generation and Translation
This paper describes SYSTRAN participation to the Document-level Generation and Trans- lation (DGT) Shared Task of the 3rd Workshop on Neural Generation and Translation (WNGT 2019). We participate for the first time using a Transformer network enhanced with modified input embeddings and optimising an additional objective function that considers content selection. The network takes in structured data of basketball games and outputs a summary of the game in natural language.
2018
pdf
bib
abs
Tencent Neural Machine Translation Systems for WMT18
Mingxuan Wang
|
Li Gong
|
Wenhuan Zhu
|
Jun Xie
|
Chao Bian
Proceedings of the Third Conference on Machine Translation: Shared Task Papers
We participated in the WMT 2018 shared news translation task on English↔Chinese language pair. Our systems are based on attentional sequence-to-sequence models with some form of recursion and self-attention. Some data augmentation methods are also introduced to improve the translation performance. The best translation result is obtained with ensemble and reranking techniques. Our Chinese→English system achieved the highest cased BLEU score among all 16 submitted systems, and our English→Chinese system ranked the third out of 18 submitted systems.
2015
pdf
bib
LIMSI: Translations as Source of Indirect Supervision for Multilingual All-Words Sense Disambiguation and Entity Linking
Marianna Apidianaki
|
Li Gong
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)
2014
pdf
bib
abs
Incremental development of statistical machine translation systems
Li Gong
|
Aurélien Max
|
François Yvon
Proceedings of the 11th International Workshop on Spoken Language Translation: Papers
Statistical Machine Translation produces results that make it a competitive option in most machine-assisted translation scenarios. However, these good results often come at a very high computational cost and correspond to training regimes which are unfit to many practical contexts, where the ability to adapt to users and domains and to continuously integrate new data (eg. in post-edition contexts) are of primary importance. In this article, we show how these requirements can be met using a strategy for on-demand word alignment and model estimation. Most remarkably, our incremental system development framework is shown to deliver top quality translation performance even in the absence of tuning, and to surpass a strong baseline when performing online tuning. All these results obtained with great computational savings as compared to conventional systems.
pdf
bib
Towards a More Efficient Development of Statistical Machine Translation Systems (Vers un développement plus efficace des systèmes de traduction statistique : un peu de vert dans un monde de BLEU) [in French]
Li Gong
|
Aurélien Max
|
François Yvon
Proceedings of TALN 2014 (Volume 2: Short Papers)
pdf
bib
(Much) Faster Construction of SMT Phrase Tables from Large-scale Parallel Corpora (Construction (très) rapide de tables de traduction à partir de grands bi-textes) [in French]
Li Gong
|
Aurélien Max
|
François Yvon
Proceedings of TALN 2014 (Volume 3: System Demonstrations)
pdf
bib
LIMSI @ WMT’14 Medical Translation Task
Nicolas Pécheux
|
Li Gong
|
Quoc Khanh Do
|
Benjamin Marie
|
Yulia Ivanishcheva
|
Alexander Allauzen
|
Thomas Lavergne
|
Jan Niehues
|
Aurélien Max
|
François Yvon
Proceedings of the Ninth Workshop on Statistical Machine Translation
2013
pdf
bib
abs
Improving bilingual sub-sentential alignment by sampling-based transpotting
Li Gong
|
Aurélien Max
|
François Yvon
Proceedings of the 10th International Workshop on Spoken Language Translation: Papers
In this article, we present a sampling-based approach to improve bilingual sub-sentential alignment in parallel corpora. This approach can be used to align parallel sentences on an as needed basis, and is able to accurately align newly available sentences. We evaluate the resulting alignments on several Machine Translation tasks. Results show that for the tasks considered here, our approach performs on par with the state-of-the-art statistical alignment pipeline giza++/Moses, and obtains superior results in a number of configurations, notably when aligning additional parallel sentence pairs carefully selected to match the test input.
2012
pdf
bib
abs
Towards contextual adaptation for any-text translation
Li Gong
|
Aurélien Max
|
François Yvon
Proceedings of the 9th International Workshop on Spoken Language Translation: Papers
Adaptation for Machine Translation has been studied in a variety of ways, using an ideal scenario where the training data can be split into ”out-of-domain” and ”in-domain” corpora, on which the adaptation is based. In this paper, we consider a more realistic setting which does not assume the availability of any kind of ”in-domain” data, hence the name ”any-text translation”. In this context, we present a new approach to contextually adapt a translation model onthe-fly, and present several experimental results where this approach outperforms conventionaly trained baselines. We also present a document-level contrastive evaluation whose results can be easily interpreted, even by non-specialists.
pdf
bib
LIMSI @ WMT12
Hai-Son Le
|
Thomas Lavergne
|
Alexandre Allauzen
|
Marianna Apidianaki
|
Li Gong
|
Aurélien Max
|
Artem Sokolov
|
Guillaume Wisniewski
|
François Yvon
Proceedings of the Seventh Workshop on Statistical Machine Translation