Tiago Timponi Torrent
Also published as: Tiago T. Torrent, Tiago Torrent
2026
Evaluating FrameNet-Based Semantic Modeling for Gender-Based Violence Detection in Clinical Records
Lívia Dutra | Arthur Lorenzi | Frederico Belcavello | Ely Matos | Marcelo Viridiano | Lorena Larré | Olívia Guaranha | Erick Santos | Sofia Reinach | Pedro de Paula | Tiago Torrent
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 2
Lívia Dutra | Arthur Lorenzi | Frederico Belcavello | Ely Matos | Marcelo Viridiano | Lorena Larré | Olívia Guaranha | Erick Santos | Sofia Reinach | Pedro de Paula | Tiago Torrent
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 2
Gender-based violence (GBV) is a major public health issue, with the World Health Organization estimating that one in three women experiences physical or sexual violence by an intimate partner during her lifetime. In Brazil, although healthcare professionals are legally required to report such cases, underreporting remains significant due to difficulties in identifying abuse and limited integration between public information systems. This study investigates whether FrameNet-based semantic annotation of open-text fields in electronic medical records can support the identification of patterns of GBV. We compare the performance of an SVM classifier for GBV cases trained on (1) frame-annotated text, (2) annotated text combined with parameterized data, and (3) parameterized data alone. Quantitative and qualitative analyses show that models incorporating semantic annotation outperform categorical models, achieving over 0.3 improvement in F1 score and demonstrating that domain-specific semantic representations provide meaningful signals beyond structured demographic data. The findings support the hypothesis that semantic analysis of clinical narratives can enhance early identification strategies and support more informed public health interventions.
2025
CaMMT: Benchmarking Culturally Aware Multimodal Machine Translation
Emilio Villa-Cueva | Sholpan Bolatzhanova | Diana Turmakhan | Kareem Elzeky | Henok Biadglign Ademtew | Alham Fikri Aji | Vladimir Araujo | Israel Abebe Azime | Jinheon Baek | Frederico Belcavello | Fermin Cristobal | Jan Christian Blaise Cruz | Mary Dabre | Raj Dabre | Toqeer Ehsan | Naome A Etori | Fauzan Farooqui | Jiahui Geng | Guido Ivetta | Thanmay Jayakumar | Soyeong Jeong | Zheng Wei Lim | Aishik Mandal | Sofía Martinelli | Mihail Minkov Mihaylov | Daniil Orel | Aniket Pramanick | Sukannya Purkayastha | Israfel Salazar | Haiyue Song | Tiago Timponi Torrent | Debela Desalegn Yadeta | Injy Hamed | Atnafu Lambebo Tonja | Thamar Solorio
Findings of the Association for Computational Linguistics: EMNLP 2025
Emilio Villa-Cueva | Sholpan Bolatzhanova | Diana Turmakhan | Kareem Elzeky | Henok Biadglign Ademtew | Alham Fikri Aji | Vladimir Araujo | Israel Abebe Azime | Jinheon Baek | Frederico Belcavello | Fermin Cristobal | Jan Christian Blaise Cruz | Mary Dabre | Raj Dabre | Toqeer Ehsan | Naome A Etori | Fauzan Farooqui | Jiahui Geng | Guido Ivetta | Thanmay Jayakumar | Soyeong Jeong | Zheng Wei Lim | Aishik Mandal | Sofía Martinelli | Mihail Minkov Mihaylov | Daniil Orel | Aniket Pramanick | Sukannya Purkayastha | Israfel Salazar | Haiyue Song | Tiago Timponi Torrent | Debela Desalegn Yadeta | Injy Hamed | Atnafu Lambebo Tonja | Thamar Solorio
Findings of the Association for Computational Linguistics: EMNLP 2025
Translating cultural content poses challenges for machine translation systems due to the differences in conceptualizations between cultures, where language alone may fail to convey sufficient context to capture region-specific meanings. In this work, we investigate whether images can act as cultural context in multimodal translation. We introduce CaMMT, a human-curated benchmark of over 5,800 triples of images along with parallel captions in English and regional languages. Using this dataset, we evaluate five Vision Language Models (VLMs) in text-only and text+image settings. Through automatic and human evaluations, we find that visual context generally improves translation quality, especially in handling Culturally-Specific Items (CSIs), disambiguation, and correct gender marking. By releasing CaMMT, our objective is to support broader efforts to build and evaluate multimodal translation systems that are better aligned with cultural nuance and regional variations.
Audition: A Frame-Annotated Multimodal Dataset for Accessible Audiovisual Content
Maucha Andrade Gamonal | Tiago Timponi Torrent | Ely Edison Matos | Adriana S. Pagano | Frederico Belcavello | Flávia Affonso Mayer | Arthur Lorenzi | Natalia S. Sigiliano | Helen de Andrade Abreu | Lívia Vicente Dutra | Marcelo Viridiano | André Coneglian | Victor A. S. Herbst | Franciany O. Campos | Kenneth Brown | Lívia Padua Ruiz | Lisandra Carvalho Bonoto | Luiz Fernando Pereira | Yulla Liquer Navarro
Proceedings of the 21st Joint ACL - ISO Workshop on Interoperable Semantic Annotation (ISA-21)
Maucha Andrade Gamonal | Tiago Timponi Torrent | Ely Edison Matos | Adriana S. Pagano | Frederico Belcavello | Flávia Affonso Mayer | Arthur Lorenzi | Natalia S. Sigiliano | Helen de Andrade Abreu | Lívia Vicente Dutra | Marcelo Viridiano | André Coneglian | Victor A. S. Herbst | Franciany O. Campos | Kenneth Brown | Lívia Padua Ruiz | Lisandra Carvalho Bonoto | Luiz Fernando Pereira | Yulla Liquer Navarro
Proceedings of the 21st Joint ACL - ISO Workshop on Interoperable Semantic Annotation (ISA-21)
This paper presents a multimodal semantic analysis of accessible Brazilian short films using a frame-based annotation approach. We introduce a subset of the Audition dataset, comprising six short films from the animation and documentary genres. We analysed three communicative modes: original audio, audio description, and visual content. Trained annotators semantically annotated each mode following the FrameNet Brazil multimodal methodology. To compare meaning across modalities, we used cosine similarity over frame-semantic representations. Results show that audio description aligns more closely with video content than original audio, reflecting its role in translating visual meaning into language. Our findings demonstrate the effectiveness of frame semantics in modelling meaning across modalities and provide quantitative evidence of audio description as a bridge between visual and verbal communication. The dataset and annotation strategies are a valuable resource for research on multimodal representation, semantic similarity, and accessible media.
SHADES: Towards a Multilingual Assessment of Stereotypes in Large Language Models
Margaret Mitchell | Giuseppe Attanasio | Ioana Baldini | Miruna Clinciu | Jordan Clive | Pieter Delobelle | Manan Dey | Sil Hamilton | Timm Dill | Jad Doughman | Ritam Dutt | Avijit Ghosh | Jessica Zosa Forde | Carolin Holtermann | Lucie-Aimée Kaffee | Tanmay Laud | Anne Lauscher | Roberto L Lopez-Davila | Maraim Masoud | Nikita Nangia | Anaelia Ovalle | Giada Pistilli | Dragomir Radev | Beatrice Savoldi | Vipul Raheja | Jeremy Qin | Esther Ploeger | Arjun Subramonian | Kaustubh Dhole | Kaiser Sun | Amirbek Djanibekov | Jonibek Mansurov | Kayo Yin | Emilio Villa Cueva | Sagnik Mukherjee | Jerry Huang | Xudong Shen | Jay Gala | Hamdan Al-Ali | Tair Djanibekov | Nurdaulet Mukhituly | Shangrui Nie | Shanya Sharma | Karolina Stanczak | Eliza Szczechla | Tiago Timponi Torrent | Deepak Tunuguntla | Marcelo Viridiano | Oskar Van Der Wal | Adina Yakefu | Aurélie Névéol | Mike Zhang | Sydney Zink | Zeerak Talat
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Margaret Mitchell | Giuseppe Attanasio | Ioana Baldini | Miruna Clinciu | Jordan Clive | Pieter Delobelle | Manan Dey | Sil Hamilton | Timm Dill | Jad Doughman | Ritam Dutt | Avijit Ghosh | Jessica Zosa Forde | Carolin Holtermann | Lucie-Aimée Kaffee | Tanmay Laud | Anne Lauscher | Roberto L Lopez-Davila | Maraim Masoud | Nikita Nangia | Anaelia Ovalle | Giada Pistilli | Dragomir Radev | Beatrice Savoldi | Vipul Raheja | Jeremy Qin | Esther Ploeger | Arjun Subramonian | Kaustubh Dhole | Kaiser Sun | Amirbek Djanibekov | Jonibek Mansurov | Kayo Yin | Emilio Villa Cueva | Sagnik Mukherjee | Jerry Huang | Xudong Shen | Jay Gala | Hamdan Al-Ali | Tair Djanibekov | Nurdaulet Mukhituly | Shangrui Nie | Shanya Sharma | Karolina Stanczak | Eliza Szczechla | Tiago Timponi Torrent | Deepak Tunuguntla | Marcelo Viridiano | Oskar Van Der Wal | Adina Yakefu | Aurélie Névéol | Mike Zhang | Sydney Zink | Zeerak Talat
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Large Language Models (LLMs) reproduce and exacerbate the social biases present in their training data, and resources to quantify this issue are limited. While research has attempted to identify and mitigate such biases, most efforts have been concentrated around English, lagging the rapid advancement of LLMs in multilingual settings. In this paper, we introduce a new multilingual parallel dataset SHADES to help address this issue, designed for examining culturally-specific stereotypes that may be learned by LLMs. The dataset includes stereotypes from 20 regions around the world and 16 languages, spanning multiple identity categories subject to discrimination worldwide. We demonstrate its utility in a series of exploratory evaluations for both “base” and “instruction-tuned” language models. Our results suggest that stereotypes are consistently reflected across models and languages, with some languages and models indicating much stronger stereotype biases than others.
2024
Modelagem baseada em frames para identificação do léxico da Violência de Gênero
Lorena Tasca Larré | Tiago Timponi Torrent
Proceedings of the 15th Brazilian Symposium in Information and Human Language Technology
Lorena Tasca Larré | Tiago Timponi Torrent
Proceedings of the 15th Brazilian Symposium in Information and Human Language Technology
Framed Multi30K: A Frame-Based Multimodal-Multilingual Dataset
Marcelo Viridiano | Arthur Lorenzi | Tiago Timponi Torrent | Ely E. Matos | Adriana S. Pagano | Natália Sathler Sigiliano | Maucha Gamonal | Helen de Andrade Abreu | Lívia Vicente Dutra | Mairon Samagaio | Mariane Carvalho | Franciany Campos | Gabrielly Azalim | Bruna Mazzei | Mateus Fonseca de Oliveira | Ana Carolina Luz | Livia Padua Ruiz | Júlia Bellei | Amanda Pestana | Josiane Costa | Iasmin Rabelo | Anna Beatriz Silva | Raquel Roza | Mariana Souza Mota | Igor Oliveira | Márcio Henrique Pelegrino de Freitas
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Marcelo Viridiano | Arthur Lorenzi | Tiago Timponi Torrent | Ely E. Matos | Adriana S. Pagano | Natália Sathler Sigiliano | Maucha Gamonal | Helen de Andrade Abreu | Lívia Vicente Dutra | Mairon Samagaio | Mariane Carvalho | Franciany Campos | Gabrielly Azalim | Bruna Mazzei | Mateus Fonseca de Oliveira | Ana Carolina Luz | Livia Padua Ruiz | Júlia Bellei | Amanda Pestana | Josiane Costa | Iasmin Rabelo | Anna Beatriz Silva | Raquel Roza | Mariana Souza Mota | Igor Oliveira | Márcio Henrique Pelegrino de Freitas
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
This paper presents Framed Multi30K (FM30K), a novel frame-based Brazilian Portuguese multimodal-multilingual dataset which i) extends the Multi30K dataset (Elliot et al., 2016) with 158,915 original Brazilian Portuguese descriptions, and 30,104 Brazilian Portuguese translations from original English descriptions; and ii) adds 2,677,613 frame evocation labels to the 158,915 English descriptions and to the ones created for Brazilian Portuguese; (iii) extends the Flickr30k Entities dataset (Plummer et al., 2015) with 190,608 frames and Frame Elements correlations with the existing phrase-to-region correlations.
Frame2: A FrameNet-based Multimodal Dataset for Tackling Text-image Interactions in Video
Frederico Belcavello | Tiago Timponi Torrent | Ely E. Matos | Adriana S. Pagano | Maucha Gamonal | Natalia Sigiliano | Lívia Vicente Dutra | Helen de Andrade Abreu | Mairon Samagaio | Mariane Carvalho | Franciany Campos | Gabrielly Azalim | Bruna Mazzei | Mateus Fonseca de Oliveira | Ana Carolina Loçasso Luz | Lívia Pádua Ruiz | Júlia Bellei | Amanda Pestana | Josiane Costa | Iasmin Rabelo | Anna Beatriz Silva | Raquel Roza | Mariana Souza | Igor Oliveira
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Frederico Belcavello | Tiago Timponi Torrent | Ely E. Matos | Adriana S. Pagano | Maucha Gamonal | Natalia Sigiliano | Lívia Vicente Dutra | Helen de Andrade Abreu | Mairon Samagaio | Mariane Carvalho | Franciany Campos | Gabrielly Azalim | Bruna Mazzei | Mateus Fonseca de Oliveira | Ana Carolina Loçasso Luz | Lívia Pádua Ruiz | Júlia Bellei | Amanda Pestana | Josiane Costa | Iasmin Rabelo | Anna Beatriz Silva | Raquel Roza | Mariana Souza | Igor Oliveira
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
This paper presents the Frame2 dataset, a multimodal dataset built from a corpus of a Brazilian travel TV show annotated for FrameNet categories for both the text and image communicative modes. Frame2 comprises 230 minutes of video, which are correlated with 2,915 sentences either transcribing the audio spoken during the episodes or the subtitling segments of the show where the host conducts interviews in English. For this first release of the dataset, a total of 11,796 annotation sets for the sentences and 6,841 for the video are included. Each of the former includes a target lexical unit evoking a frame or one or more frame elements. For each video annotation, a bounding box in the image is correlated with a frame, a frame element and lexical unit evoking a frame in FrameNet.
MoCCA: A Model of Comparative Concepts for Aligning Constructicons
Arthur Lorenzi | Peter Ljunglöf | Ben Lyngfelt | Tiago Timponi Torrent | William Croft | Alexander Ziem | Nina Böbel | Linnéa Bäckström | Peter Uhrig | Ely E. Matos
Proceedings of the 20th Joint ACL - ISO Workshop on Interoperable Semantic Annotation @ LREC-COLING 2024
Arthur Lorenzi | Peter Ljunglöf | Ben Lyngfelt | Tiago Timponi Torrent | William Croft | Alexander Ziem | Nina Böbel | Linnéa Bäckström | Peter Uhrig | Ely E. Matos
Proceedings of the 20th Joint ACL - ISO Workshop on Interoperable Semantic Annotation @ LREC-COLING 2024
This paper presents MoCCA, a Model of Comparative Concepts for Aligning Constructicons under development by a consortium of research groups building Constructicons of different languages including Brazilian Portuguese, English, German and Swedish. The Constructicons will be aligned by using comparative concepts (CCs) providing language-neutral definitions of linguistic properties. The CCs are drawn from typological research on grammatical categories and constructions, and from FrameNet frames, organized in a conceptual network. Language-specific constructions are linked to the CCs in accordance with general principles. MoCCA is organized into files of two types: a largely static CC Database file and multiple Linking files containing relations between constructions in a Constructicon and the CCs. Tools are planned to facilitate visualization of the CC network and linking of constructions to the CCs. All files and guidelines will be versioned, and a mechanism is set up to report cases where a language-specific construction cannot be easily linked to existing CCs.
Semantic Permanence in Audiovisual Translation: a FrameNet approach to subtitling
Mairon Samagaio | Tiago Torrent | Ely Matos | Arthur Almeida
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1
Mairon Samagaio | Tiago Torrent | Ely Matos | Arthur Almeida
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1
2023
Modeling Construction Grammar’s Way into NLP: Insights from negative results in automatically identifying schematic clausal constructions in Brazilian Portuguese
Arthur Lorenzi | Vânia Gomes de Almeida | Ely Edison Matos | Tiago Timponi Torrent
Proceedings of the First International Workshop on Construction Grammars and NLP (CxGs+NLP, GURT/SyntaxFest 2023)
Arthur Lorenzi | Vânia Gomes de Almeida | Ely Edison Matos | Tiago Timponi Torrent
Proceedings of the First International Workshop on Construction Grammars and NLP (CxGs+NLP, GURT/SyntaxFest 2023)
This paper reports on negative results in a task of automatic identification of schematic clausal constructions and their elements in Brazilian Portuguese. The experiment was set up so as to test whether form and meaning properties of constructions, modeled in terms of Universal Dependencies and FrameNet Frames in a Constructicon, would improve the performance of transformer models in the task. Qualitative analysis of the results indicate that alternatives to the linearization of those properties, dataset size and a post-processing module should be explored in the future as a means to make use of information in Constructicons for NLP tasks.
Anotação do Dataset Multimodal da ReINVenTA
Ana Carolina Loçasso Luz | Gabrielly Braz | Lívia Pádua Ruiz | Mariane de Carvalho Pinto | Frederico Belcavello | Natália Sathler Sigiliano | Tiago Torrent
Proceedings of the 14th Brazilian Symposium in Information and Human Language Technology
Ana Carolina Loçasso Luz | Gabrielly Braz | Lívia Pádua Ruiz | Mariane de Carvalho Pinto | Frederico Belcavello | Natália Sathler Sigiliano | Tiago Torrent
Proceedings of the 14th Brazilian Symposium in Information and Human Language Technology
Building a Frame-Semantic Model of the Healthcare Domain: Towards the identification of gender-based violence in public health data
Livia Dutra | Arthur Lorenzi | Lorena Larre | Frederico Belcavello | Ely Matos | Amanda Pestana | Kenneth Brown | Mariana Gonalves | Victor Herbst | Sofia Reinach | Renato Teixeira | Pedro de Paula | Alessandra Pellini | Cibele Sequeira | Ester Sabino | Fabio Leal | Mônica Conde | Regina Grespan | Tiago Torrent
Proceedings of the 14th Brazilian Symposium in Information and Human Language Technology
Livia Dutra | Arthur Lorenzi | Lorena Larre | Frederico Belcavello | Ely Matos | Amanda Pestana | Kenneth Brown | Mariana Gonalves | Victor Herbst | Sofia Reinach | Renato Teixeira | Pedro de Paula | Alessandra Pellini | Cibele Sequeira | Ester Sabino | Fabio Leal | Mônica Conde | Regina Grespan | Tiago Torrent
Proceedings of the 14th Brazilian Symposium in Information and Human Language Technology
2022
The Case for Perspective in Multimodal Datasets
Marcelo Viridiano | Tiago Timponi Torrent | Oliver Czulo | Arthur Lorenzi | Ely Matos | Frederico Belcavello
Proceedings of the 1st Workshop on Perspectivist Approaches to NLP @LREC2022
Marcelo Viridiano | Tiago Timponi Torrent | Oliver Czulo | Arthur Lorenzi | Ely Matos | Frederico Belcavello
Proceedings of the 1st Workshop on Perspectivist Approaches to NLP @LREC2022
This paper argues in favor of the adoption of annotation practices for multimodal datasets that recognize and represent the inherently perspectivized nature of multimodal communication. To support our claim, we present a set of annotation experiments in which FrameNet annotation is applied to the Multi30k and the Flickr 30k Entities datasets. We assess the cosine similarity between the semantic representations derived from the annotation of both pictures and captions for frames. Our findings indicate that: (i) frame semantic similarity between captions of the same picture produced in different languages is sensitive to whether the caption is a translation of another caption or not, and (ii) picture annotation for semantic frames is sensitive to whether the image is annotated in presence of a caption or not.
Lutma: A Frame-Making Tool for Collaborative FrameNet Development
Tiago Timponi Torrent | Arthur Lorenzi | Ely Edison Matos | Frederico Belcavello | Marcelo Viridiano | Maucha Andrade Gamonal
Proceedings of the 1st Workshop on Perspectivist Approaches to NLP @LREC2022
Tiago Timponi Torrent | Arthur Lorenzi | Ely Edison Matos | Frederico Belcavello | Marcelo Viridiano | Maucha Andrade Gamonal
Proceedings of the 1st Workshop on Perspectivist Approaches to NLP @LREC2022
This paper presents Lutma, a collaborative, semi-constrained, tutorial-based tool for contributing frames and lexical units to the Global FrameNet initiative. The tool parameterizes the process of frame creation, avoiding consistency violations and promoting the integration of frames contributed by the community with existing frames. Lutma is structured in a wizard-like fashion so as to provide users with text and video tutorials relevant for each step in the frame creation process. We argue that this tool will allow for a sensible expansion of FrameNet coverage in terms of both languages and cultural perspectives encoded by them, positioning frames as a viable alternative for representing perspective in language models.
Frame Shift Prediction
Zheng Xin Yong | Patrick D. Watson | Tiago Timponi Torrent | Oliver Czulo | Collin Baker
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Zheng Xin Yong | Patrick D. Watson | Tiago Timponi Torrent | Oliver Czulo | Collin Baker
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Frame shift is a cross-linguistic phenomenon in translation which results in corresponding pairs of linguistic material evoking different frames. The ability to predict frame shifts would enable (semi-)automatic creation of multilingual frame annotations and thus speeding up FrameNet creation through annotation projection. Here, we first characterize how frame shifts result from other linguistic divergences such as translational divergences and construal differences. Our analysis also shows that many pairs of frames in frame shifts are multi-hop away from each other in Berkeley FrameNet’s net-like configuration. Then, we propose the Frame Shift Prediction task and demonstrate that our graph attention networks, combined with auxiliary training, can learn cross-linguistic frame-to-frame correspondence and predict frame shifts.
Charon: A FrameNet Annotation Tool for Multimodal Corpora
Frederico Belcavello | Marcelo Viridiano | Ely Matos | Tiago Timponi Torrent
Proceedings of the 16th Linguistic Annotation Workshop (LAW-XVI) within LREC2022
Frederico Belcavello | Marcelo Viridiano | Ely Matos | Tiago Timponi Torrent
Proceedings of the 16th Linguistic Annotation Workshop (LAW-XVI) within LREC2022
This paper presents Charon, a web tool for annotating multimodal corpora with FrameNet categories. Annotation can be made for corpora containing both static images and video sequences paired – or not – with text sequences. The pipeline features, besides the annotation interface, corpus import and pre-processing tools.
Domain Adaptation in Neural Machine Translation using a Qualia-Enriched FrameNet
Alexandre Diniz da Costa | Mateus Coutinho Marim | Ely Matos | Tiago Timponi Torrent
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Alexandre Diniz da Costa | Mateus Coutinho Marim | Ely Matos | Tiago Timponi Torrent
Proceedings of the Thirteenth Language Resources and Evaluation Conference
In this paper we present Scylla, a methodology for domain adaptation of Neural Machine Translation (NMT) systems that make use of a multilingual FrameNet enriched with qualia relations as an external knowledge base. Domain adaptation techniques used in NMT usually require fine-tuning and in-domain training data, which may pose difficulties for those working with lesser-resourced languages and may also lead to performance decay of the NMT system for out-of-domain sentences. Scylla does not require fine-tuning of the NMT model, avoiding the risk of model over-fitting and consequent decrease in performance for out-of-domain translations. Two versions of Scylla are presented: one using the source sentence as input, and another one using the target sentence. We evaluate Scylla in comparison to a state-of-the-art commercial NMT system in an experiment in which 50 sentences from the Sports domain are translated from Brazilian Portuguese to English. The two versions of Scylla significantly outperform the baseline commercial system in HTER.
2021
Construões de Estrutura Argumental com Argumento Preposicionado: uma modelagem linguistico-computacional na FrameNet Brasil
Vânia Almeida | Tiago Torrent
Proceedings of the 13th Brazilian Symposium in Information and Human Language Technology
Vânia Almeida | Tiago Torrent
Proceedings of the 13th Brazilian Symposium in Information and Human Language Technology
Proceedings of the 13th Brazilian Symposium in Information and Human Language Technology
Evandro Eduardo Seron Ruiz | Tiago Timponi Torrent
Proceedings of the 13th Brazilian Symposium in Information and Human Language Technology
Evandro Eduardo Seron Ruiz | Tiago Timponi Torrent
Proceedings of the 13th Brazilian Symposium in Information and Human Language Technology
Modelagem de Construões Interrogativas QU- no Constructicon da FrameNet Brasil
Natalia Marão | Tiago Torrent
Proceedings of the 13th Brazilian Symposium in Information and Human Language Technology
Natalia Marão | Tiago Torrent
Proceedings of the 13th Brazilian Symposium in Information and Human Language Technology
2020
(Re)construing Meaning in NLP
Sean Trott | Tiago Timponi Torrent | Nancy Chang | Nathan Schneider
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Sean Trott | Tiago Timponi Torrent | Nancy Chang | Nathan Schneider
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Human speakers have an extensive toolkit of ways to express themselves. In this paper, we engage with an idea largely absent from discussions of meaning in natural language understanding—namely, that the way something is expressed reflects different ways of conceptualizing or construing the information being conveyed. We first define this phenomenon more precisely, drawing on considerable prior work in theoretical cognitive semantics and psycholinguistics. We then survey some dimensions of construed meaning and show how insights from construal could inform theoretical and practical work in NLP.
Semi-supervised Deep Embedded Clustering with Anomaly Detection for Semantic Frame Induction
Zheng Xin Yong | Tiago Timponi Torrent
Proceedings of the Twelfth Language Resources and Evaluation Conference
Zheng Xin Yong | Tiago Timponi Torrent
Proceedings of the Twelfth Language Resources and Evaluation Conference
Although FrameNet is recognized as one of the most fine-grained lexical databases, its coverage of lexical units is still limited. To tackle this issue, we propose a two-step frame induction process: for a set of lexical units not yet present in Berkeley FrameNet data release 1.7, first remove those that cannot fit into any existing semantic frame in FrameNet; then, assign the remaining lexical units to their correct frames. We also present the Semi-supervised Deep Embedded Clustering with Anomaly Detection (SDEC-AD) model—an algorithm that maps high-dimensional contextualized vector representations of lexical units to a low-dimensional latent space for better frame prediction and uses reconstruction error to identify lexical units that cannot evoke frames in FrameNet. SDEC-AD outperforms the state-of-the-art methods in both steps of the frame induction process. Empirical results also show that definitions provide contextual information for representing and characterizing the frame membership of lexical units.
Frame-Based Annotation of Multimodal Corpora: Tracking (A)Synchronies in Meaning Construction
Frederico Belcavello | Marcelo Viridiano | Alexandre Diniz da Costa | Ely Edison da Silva Matos | Tiago Timponi Torrent
Proceedings of the International FrameNet Workshop 2020: Towards a Global, Multilingual FrameNet
Frederico Belcavello | Marcelo Viridiano | Alexandre Diniz da Costa | Ely Edison da Silva Matos | Tiago Timponi Torrent
Proceedings of the International FrameNet Workshop 2020: Towards a Global, Multilingual FrameNet
Multimodal aspects of human communication are key in several applications of Natural Language Processing, such as Machine Translation and Natural Language Generation. Despite recent advances in integrating multimodality into Computational Linguistics, the merge between NLP and Computer Vision techniques is still timid, especially when it comes to providing fine-grained accounts for meaning construction. This paper reports on research aiming to determine appropriate methodology and develop a computational tool to annotate multimodal corpora according to a principled structured semantic representation of events, relations and entities: FrameNet. Taking a Brazilian television travel show as corpus, a pilot study was conducted to annotate the frames that are evoked by the audio and the ones that are evoked by visual elements. We also implemented a Multimodal Annotation tool which allows annotators to choose frames and locate frame elements both in the text and in the images, while keeping track of the time span in which those elements are active in each modality. Results suggest that adding a multimodal domain to the linguistic layer of annotation and analysis contributes both to enrich the kind of information that can be tagged in a corpus, and to enhance FrameNet as a model of linguistic cognition.
Proceedings of the International FrameNet Workshop 2020: Towards a Global, Multilingual FrameNet
Tiago T. Torrent | Collin F. Baker | Oliver Czulo | Kyoko Ohara | Miriam R. L. Petruck
Proceedings of the International FrameNet Workshop 2020: Towards a Global, Multilingual FrameNet
Tiago T. Torrent | Collin F. Baker | Oliver Czulo | Kyoko Ohara | Miriam R. L. Petruck
Proceedings of the International FrameNet Workshop 2020: Towards a Global, Multilingual FrameNet
Beyond lexical semantics: notes on pragmatic frames
Oliver Czulo | Alexander Ziem | Tiago Timponi Torrent
Proceedings of the International FrameNet Workshop 2020: Towards a Global, Multilingual FrameNet
Oliver Czulo | Alexander Ziem | Tiago Timponi Torrent
Proceedings of the International FrameNet Workshop 2020: Towards a Global, Multilingual FrameNet
Framenets as an incarnation of frame semantics have been set up to deal with lexicographic issues (cf. Fillmore and Baker 2010, among others). They are thus concerned with lexical units (LUs) and the conceptual structure which categorizes these together. These lexically-evoked frames, however, do not reflect pragmatic properties of constructions (LUs and other types of constructions), such as expressing illocutions or being considered polite or very informal. From the viewpoint of a multilingual annotation effort, the Global FrameNet Shared Annotation Task, we discuss two phenomena, greetings and tag questions, which highlight the necessity both to investigate the role between construction and frame annotation on the one hand and to develop pragmatic frames describing social interactions which are not explicitly lexicalized.
2019
Designing a Frame-Semantic Machine Translation Evaluation Metric
Oliver Czulo | Tiago Timponi Torrent | Ely Edison da Silva Matos | Alexandre Diniz da Costa | Debanjana Kar
Proceedings of the Human-Informed Translation and Interpreting Technology Workshop (HiT-IT 2019)
Oliver Czulo | Tiago Timponi Torrent | Ely Edison da Silva Matos | Alexandre Diniz da Costa | Debanjana Kar
Proceedings of the Human-Informed Translation and Interpreting Technology Workshop (HiT-IT 2019)
We propose a metric for machine translation evaluation based on frame semantics which does not require the use of reference translations or human corrections, but is aimed at comparing original and translated output directly. The metrics is described on the basis of an existing manual frame-semantic annotation of a parallel corpus with an English original and a Brazilian Portuguese and a German translation. We discuss implications of our metrics design, including the potential of scaling it for multiple languages.
2017
Constituição de Um Dicionário Eletrônico Trilíngue Fundado em Frames a partir da Extração Automática de Candidatos a Termos do Domínio do Turismo (The Constitution of a Trilingual Eletronic Dictionary Based on Frames from the Automatic Extraction of Candidate Terms of the Tourism Domain)[In Portuguese]
Simone Rodrigues Peron-Corrêa | Tiago Timponi Torrent
Proceedings of the 11th Brazilian Symposium in Information and Human Language Technology
Simone Rodrigues Peron-Corrêa | Tiago Timponi Torrent
Proceedings of the 11th Brazilian Symposium in Information and Human Language Technology
A Modelagem Computacional do Domínio dos Esportes na FrameNet Brasil (The Computational Modeling of the Sports Domain in FrameNet Brasil)[In Portuguese]
Alexandre Diniz Costa | Tiago Timponi Torrent
Proceedings of the 11th Brazilian Symposium in Information and Human Language Technology
Alexandre Diniz Costa | Tiago Timponi Torrent
Proceedings of the 11th Brazilian Symposium in Information and Human Language Technology
Descrição e modelagem de construções interrogativas QU- em Português Brasileiro para o desenvolvimento de um chatbot (Description and modeling of interrogative constructs QU- in Brazilian Portuguese for the development of a chatbot)[In Portuguese]
Natália Duarte Marção | Tiago Timponi Torrent | Ely Edison da Silva Matos
Proceedings of the 11th Brazilian Symposium in Information and Human Language Technology
Natália Duarte Marção | Tiago Timponi Torrent | Ely Edison da Silva Matos
Proceedings of the 11th Brazilian Symposium in Information and Human Language Technology
Uma Proposta Metodológica para a Categorização Automatizada de Atrações Turísticas a partir de Comentários de Usuários em Plataformas Online (A Methodological Proposition for the Automatic Categorization of Touristic Attractions from User Comments in Online Platforms)[In Portuguese]
Vanessa Maria Ramos Lopes Paiva | Tiago Timponi Torrent
Proceedings of the 11th Brazilian Symposium in Information and Human Language Technology
Vanessa Maria Ramos Lopes Paiva | Tiago Timponi Torrent
Proceedings of the 11th Brazilian Symposium in Information and Human Language Technology
Construções de Estrutura Argumental no âmbito do Constructicon da FrameNet Brasil: proposta de uma modelagem linguístico-computacional (Structural Constructs of Arguments in the Context of the Construction of FrameNet Brasil: a proposal for a computational-linguistic modeling)[In Portuguese]
Vânia Gomes Almeida | Tiago Timponi Torrent
Proceedings of the 11th Brazilian Symposium in Information and Human Language Technology
Vânia Gomes Almeida | Tiago Timponi Torrent
Proceedings of the 11th Brazilian Symposium in Information and Human Language Technology
2014
Copa 2014 FrameNet Brasil: a frame-based trilingual electronic dictionary for the Football World Cup
Tiago T. Torrent | Maria Margarida M. Salomão | Fernanda C. A. Campos | Regina M. M. Braga | Ely E. S. Matos | Maucha A. Gamonal | Julia A. Gonçalves | Bruno C. P. Souza | Daniela S. Gomes | Simone R. Peron
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: System Demonstrations
Tiago T. Torrent | Maria Margarida M. Salomão | Fernanda C. A. Campos | Regina M. M. Braga | Ely E. S. Matos | Maucha A. Gamonal | Julia A. Gonçalves | Bruno C. P. Souza | Daniela S. Gomes | Simone R. Peron
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: System Demonstrations
Search
Fix author
Co-authors
- Frederico Belcavello 10
- Ely Edison da Silva Matos 10
- Arthur Lorenzi 8
- Marcelo Viridiano 8
- Oliver Czulo 5
- Lívia Pádua Ruiz 4
- Helen de Andrade Abreu 3
- Alexandre Diniz da Costa 3
- Lívia Vicente Dutra 3
- Ely E. Matos 3
- Ely Edison Matos 3
- Adriana S. Pagano 3
- Amanda Pestana 3
- Mairon Samagaio 3
- Gabrielly Azalim 2
- Collin F. Baker 2
- Júlia Bellei 2
- Kenneth Brown 2
- Franciany Campos 2
- Mariane Carvalho 2
- Josiane Costa 2
- Lívia Dutra 2
- Maucha Gamonal 2
- Maucha Andrade Gamonal 2
- Lorena Larré 2
- Ana Carolina Loçasso Luz 2
- Bruna Mazzei 2
- Igor Oliveira 2
- Pedro de Paula 2
- Iasmin Rabelo 2
- Sofia Reinach 2
- Raquel Roza 2
- Natália Sathler Sigiliano 2
- Anna Beatriz Silva 2
- Emilio Villa-Cueva 2
- Zheng Xin Yong 2
- Alexander Ziem 2
- Mateus Fonseca de Oliveira 2
- Henok Biadglign Ademtew 1
- Alham Fikri Aji 1
- Hamdan Al-Ali 1
- Vânia Almeida 1
- Arthur Almeida 1
- Vânia Gomes Almeida 1
- Vladimir Araujo 1
- Giuseppe Attanasio 1
- Israel Abebe Azime 1
- Jinheon Baek 1
- Ioana Baldini 1
- Sholpan Bolatzhanova 1
- Lisandra Carvalho Bonoto 1
- Regina M. M. Braga 1
- Gabrielly Braz 1
- Linnéa Bäckström 1
- Nina Böbel 1
- Franciany O. Campos 1
- Fernanda C. A. Campos 1
- Nancy Chang 1
- Miruna Clinciu 1
- Jordan Clive 1
- Mônica Conde 1
- André Coneglian 1
- Alexandre Diniz Costa 1
- Mateus Coutinho Marim 1
- Fermin Cristobal 1
- William Croft 1
- Jan Christian Blaise Cruz 1
- Mary Dabre 1
- Raj Dabre 1
- Pieter Delobelle 1
- Manan Dey 1
- Kaustubh Dhole 1
- Timm Dill 1
- Amirbek Djanibekov 1
- Jad Doughman 1
- Ritam Dutt 1
- Toqeer Ehsan 1
- Kareem Elzeky 1
- Naome A. Etori 1
- Fauzan Farooqui 1
- Jessica Zosa Forde 1
- Jay Gala 1
- Maucha A. Gamonal 1
- Jiahui Geng 1
- Avijit Ghosh 1
- Daniela S. Gomes 1
- Vânia Gomes de Almeida 1
- Mariana Gonalves 1
- Julia A. Gonçalves 1
- Regina Grespan 1
- Olívia Guaranha 1
- Injy Hamed 1
- Sil Hamilton 1
- Victor A. S. Herbst 1
- Victor Herbst 1
- Carolin Holtermann 1
- Jerry Huang 1
- Guido Ivetta 1
- Thanmay Jayakumar 1
- Soyeong Jeong 1
- Lucie-Aimée Kaffee 1
- Debanjana Kar 1
- Lorena Tasca Larré 1
- Tanmay Laud 1
- Anne Lauscher 1
- Fabio Leal 1
- Zheng Wei Lim 1
- Peter Ljunglöf 1
- Roberto L Lopez-Davila 1
- Ana Carolina Luz 1
- Ben Lyngfelt 1
- Aishik Mandal 1
- Jonibek Mansurov 1
- Sofía Martinelli 1
- Natalia Marão 1
- Natália Duarte Marção 1
- Maraim Masoud 1
- Flávia Affonso Mayer 1
- Mihail Minkov Mihaylov 1
- Margaret Mitchell 1
- Sagnik Mukherjee 1
- Nurdaulet Mukhituly 1
- Nikita Nangia 1
- Yulla Liquer Navarro 1
- Aurelie Neveol 1
- Shangrui Nie 1
- Kyoko Ohara 1
- Daniil Orel 1
- Anaelia Ovalle 1
- Vanessa Maria Ramos Lopes Paiva 1
- Márcio Henrique Pelegrino de Freitas 1
- Alessandra Pellini 1
- Luiz Fernando Pereira 1
- Simone R. Peron 1
- Simone Rodrigues Peron-Corrêa 1
- Miriam R. L. Petruck 1
- Mariane de Carvalho Pinto 1
- Giada Pistilli 1
- Esther Ploeger 1
- Aniket Pramanick 1
- Sukannya Purkayastha 1
- Jeremy Qin 1
- Dragomir Radev 1
- Vipul Raheja 1
- Evandro Eduardo Seron Ruiz 1
- Ester Sabino 1
- Israfel Salazar 1
- Maria Margarida M. Salomão 1
- Erick Santos 1
- Beatrice Savoldi 1
- Nathan Schneider 1
- Cibele Sequeira 1
- Shanya Sharma 1
- Xudong Shen 1
- Natalia S. Sigiliano 1
- Natalia Sigiliano 1
- Thamar Solorio 1
- Haiyue Song 1
- Bruno C. P. Souza 1
- Mariana Souza 1
- Mariana Souza Mota 1
- Karolina Stanczak 1
- Arjun Subramonian 1
- Kaiser Sun 1
- Eliza Szczechla 1
- Tair Djanibekov 1
- Zeerak Talat 1
- Renato Teixeira 1
- Atnafu Lambebo Tonja 1
- Sean Trott 1
- Deepak Tunuguntla 1
- Diana Turmakhan 1
- Peter Uhrig 1
- Oskar Van Der Wal 1
- Patrick D. Watson 1
- Debela Desalegn Yadeta 1
- Adina Yakefu 1
- Kayo Yin 1
- Mike Zhang 1
- Sydney Zink 1