Oto Araújo Vale

Also published as: Oto Araujo Vale


2026

Recorded interviews can capture their subjects’ memories, perceptions, and emotions. When conducted with notable figures, they also have the potential to serve as a resource for interdisciplinary research, impacting various branches of science. In this work, we mark the beginning of a significant project analyzing interviews from the Roda Viva program, the longest-running interview show on Brazilian television. In this initial study, we examined six memorable interviews with six Brazilian Formula One drivers to compare the performance of two named entity recognition methods: a statistical-neural method and large language models, both evaluated against manual annotations. Still, it highlighted relevant qualitative distinctions: the statistical method showed a rigid dependence on capitalisation and lexical familiarity, leading to mechanical false positives and missing non-capitalised entities, while the LLM exhibited greater linguistic sensitivity, retrieving contextual entities and being robust to transcription errors, though it still produces false positives. The LLM-based model appears more promising due to its flexibility and the potential for refinement via instructions to filter for ambiguities, favouring the automation of social network extraction in the corpus.
This paper presents a syntactic lexicon of Brazilian Portuguese predicative adjectives that are not regularly derived from verbs. From the 7,000 most frequent adjectives in a large web corpus, 3,161 lexical items were selected and annotated with 36 syntactic properties. These properties were established through introspection and corpus evidence, covering argument structure, copular verbs, prepositions, transformations (e.g., raising, nominalization), semantic roles, and others. The resulting resource constitutes a machine-readable lexicon of predicative adjectives for Brazilian Portuguese.

2024

2023