Lufei Liu


2023

pdf bib
Projet Gender Equality Monitor (GEM)
Gilles Adda | François Buet | Sahar Ghannay | Cyril Grouin | Camille Guinaudeau | Lufei Liu | Aurélie Névéol | Albert Rilliard | Uro Rémi
Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 6 : projets

Le projet ANR Gender Equality Monitor (GEM) est coordonné par l’Institut National de l’Audiovisuel(INA) et vise à étudier la place des femmes dans les médias (radio et télévision). Dans cette soumission,nous présentons le travail réalisé au LISN : (i) étude diachronique des caractéristiques acoustiquesde la voix en fonction du genre et de l’âge, (ii) comparaison acoustique de la voix des femmeset hommes politiques montrant une incohérence entre performance vocale et commentaires sur lavoix, (iii) réalisation d’un système automatique d’estimation de la féminité perçue à partir descaractéristiques vocales, (iv) comparaison de systèmes de segmentation thématique de transcriptionsautomatiques de données audiovisuelles, (v) mesure des biais sociétaux dans les modèles de languedans un contexte multilingue et multi-culturel, et (vi) premiers essais d’identification de la publicitéen fonction du genre du locuteur.

pdf bib
Annotating Discursive Roles of Sentences in Patent Descriptions
Lufei Liu | Xu Sun | François Veltz | Kim Gerdes
Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII)

Patent descriptions are a crucial component of patent applications, as they are key to understanding the invention and play a significant role in securing patent grants. While discursive analyses have been undertaken for scientific articles, they have not been as thoroughly explored for patent descriptions, despite the increasing importance of Intellectual Property and the constant rise of the number of patent applications. In this study, we propose an annotation scheme containing 16 classes that allows categorizing each sentence in patent descriptions according to their discursive roles. We publish an experimental human-annotated corpus of 16 patent descriptions and analyze challenges that may be encountered in such work. This work can be base for an automated annotation and thus contribute to enriching linguistic resources in the patent domain.

2020

pdf bib
Building an English-Chinese Parallel Corpus Annotated with Sub-sentential Translation Techniques
Yuming Zhai | Lufei Liu | Xinyi Zhong | Gbariel Illouz | Anne Vilnat
Proceedings of the Twelfth Language Resources and Evaluation Conference

Human translators often resort to different non-literal translation techniques besides the literal translation, such as idiom equivalence, generalization, particularization, semantic modulation, etc., especially when the source and target languages have different and distant origins. Translation techniques constitute an important subject in translation studies, which help researchers to understand and analyse translated texts. However, they receive less attention in developing Natural Language Processing (NLP) applications. To fill this gap, one of our long term objectives is to have a better semantic control of extracting paraphrases from bilingual parallel corpora. Based on this goal, we suggest this hypothesis: it is possible to automatically recognize different sub-sentential translation techniques. For this original task, since there is no dedicated data set for English-Chinese, we manually annotated a parallel corpus of eleven genres. Fifty sentence pairs for each genre have been annotated in order to consolidate our annotation guidelines. Based on this data set, we conducted an experiment to classify between literal and non-literal translations. The preliminary results confirm our hypothesis. The corpus and code are available. We hope that this annotated corpus will be useful for linguistic contrastive studies and for fine-grained evaluation of NLP tasks, such as automatic word alignment and machine translation.