Cláudia Freitas

Also published as: Claudia Freitas

2024

PropBank e anotacão de papéis semânticos para a língua portuguesa: O que há de novo?
Cláudia Freitas | Thiago Pardo
Proceedings of the 15th Brazilian Symposium in Information and Human Language Technology

2023

pdf bib

Um pronome com muitas funões: Descrião e resultados da anotação do pronome -se em um treebank segundo o esquema Universal Dependencies (UD) para Português
Elvis de Souza | Claudia Freitas
Proceedings of the 14th Brazilian Symposium in Information and Human Language Technology

pdf bib

Proposta e Avaliação Linguistica de Tecnicas de Aumento de Dados
Arthur Scalercio | Claudia Freitas
Proceedings of the 14th Brazilian Symposium in Information and Human Language Technology

pdf bib

Explorando variaões no tagset e na anotação Universal Dependencies (UD) para Português: Possibilidades e resultados com base no treebank PetroGold
Elvis de Souza | Cláudia Freitas
Proceedings of the 14th Brazilian Symposium in Information and Human Language Technology

pdf bib

Annotation of fixed Multiword Expressions (MWEs) in a Portuguese Universal Dependencies (UD) treebank: Gathering candidates from three different sources
Elvis Souza | Claudia Freitas
Proceedings of the 2nd Edition of the Universal Dependencies Brazilian Festival

2022

pdf bib

Still on arguments and adjuncts: the status of the indirect object and the adverbial adjunct relations in Universal Dependencies for Portuguese
Elvis Souza | Claudia Freitas
Proceedings of the Universal Dependencies Brazilian Festival

pdf bib

Polishing the gold – how much revision do we need in treebanks?
Elvis Souza | Claudia Freitas
Proceedings of the Universal Dependencies Brazilian Festival

2021

pdf bib

PetroGold – Corpus padrão ouro para o dominio do petroleo
Elvis Souza | Aline Silveira | Tatiana Cavalcanti | Maria Castro | Claudia Freitas
Proceedings of the 13th Brazilian Symposium in Information and Human Language Technology

pdf bib abs

ET: A Workstation for Querying, Editing and Evaluating Annotated Corpora
Elvis de Souza | Cláudia Freitas
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

In this paper we explore the functionalities of ET, a suite designed to support linguistic research and natural language processing tasks using corpora annotated in the CoNLL-U format. These goals are achieved by two integrated environments – Interrogatório, an environment for querying and editing annotated corpora, and Julgamento, an environment for assessing their quality. ET is open-source, built on different Python Web technologies and has Web demonstrations available on-line. ET has been intensively used in our research group for over two years, being the chosen framework for several linguistic and NLP-related studies conducted by its researchers.

This paper presents some work on direct and indirect speech in Portuguese using corpus-based methods: we report on a study whose aim was to identify (i) Portuguese verbs used to introduce reported speech and (ii) syntactic patterns used to convey reported speech, in order to enhance the performance of a quotation extraction system, dubbed QUEMDISSE?. In addition, (iii) we present a Portuguese corpus annotated with reported speech, using the lexicon and rules provided by (i) and (ii), and discuss the process of their annotation and what was learned.

pdf bib abs

Semantic relations between words are key to building systems that aim to understand and manipulate language. For English, the “de facto” standard for representing this kind of knowledge is Princeton’s WordNet. Here, we describe the wordnet-like resources currently available for Portuguese: their origins, methods of creation, sizes, and usage restrictions. We start tackling the problem of comparing them, but only in quantitative terms. Finally, we sketch ideas for potential collaboration between some of the projects that produce Portuguese wordnets.

2015

pdf bib

Seeing is Correcting: curating lexical resources using social interfaces
Livy Real | Fabricio Chalub | Valeria de Paiva | Claudia Freitas | Alexandre Rademaker
Proceedings of the 4th Workshop on Linked Data in Linguistics: Resources and Applications

pdf bib

Proceedings of the 10th Brazilian Symposium in Information and Human Language Technology
Claudia Freitas | Alexandre Rademaker
Proceedings of the 10th Brazilian Symposium in Information and Human Language Technology

pdf bib

Anotação de corpus com a OpenWordNet-PT: um exercício de desambiguação (Sense annotation with OpenWordNet-PT: an exercise of word sense disambiguation)
Cláudia Freitas | Livy Real | Alexandre Rademaker
Proceedings of the 10th Brazilian Symposium in Information and Human Language Technology

2012

pdf bib abs

Págico: Evaluating Wikipedia-based information retrieval in Portuguese
Cristina Mota | Alberto Simões | Cláudia Freitas | Luís Costa | Diana Santos
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

How do people behave in their everyday information seeking tasks, which often involve Wikipedia? Are there systems which can help them, or do a similar job? In this paper we describe Págico, an evaluation contest with the main purpose of fostering research in these topics. We describe its motivation, the collection of documents created, the evaluation setup, the topics chosen and their choice, the participation, as well as the measures used for evaluation and the gathered resources. The task―between information retrieval and question answering―can be further described as answering questions related to Portuguese-speaking culture in the Portuguese Wikipedia, in a number of different themes and geographic and temporal angles. This initiative allowed us to create interesting datasets and perform some assessment of Wikipedia, while also improving a public-domain open-source system for further wikipedia-based evaluations. In the paper, we provide examples of questions, we report the results obtained by the participants, and provide some discussion on complex issues.

2010

pdf bib abs

Second HAREM: Advancing the State of the Art of Named Entity Recognition in Portuguese
Cláudia Freitas | Cristina Mota | Diana Santos | Hugo Gonçalo Oliveira | Paula Carvalho
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper, we present Second HAREM, the second edition of an evaluation campaign for Portuguese, addressing named entity recognition (NER). This second edition also included two new tracks: the recognition and normalization of temporal entities (proposed by a group of participants, and hence not covered on this paper) and ReRelEM, the detection of semantic relations between named entities. We summarize the setup of Second HAREM by showing the preserved distinctive features and discussing the changes compared to the first edition. Furthermore, we present the main results achieved and describe the available resources and tools developed under this evaluation, namely,(i) the golden collections, i.e. a set of documents whose named entities and semantic relations between those entities were manually annotated, (ii) the Second HAREM collection (which contains the unannotated version of the golden collection), as well as the participating systems results on it, (iii) the scoring tools, and (iv) SAHARA, a Web application that allows interactive evaluation. We end the paper by offering some remarks about what was learned.