Sergio José Rodríguez Méndez

NYCU

Also published as: Sergio J. Rodriguez Mendez, Sergio Rodríguez Méndez

2024

Zero- and Few-Shots Knowledge Graph Triplet Extraction with Large Language Models
Andrea Papaluca | Daniel Krefl | Sergio Rodríguez Méndez | Artem Lensky | Hanna Suominen
Proceedings of the 1st Workshop on Knowledge Graphs and Large Language Models (KaLLM 2024)

In this work, we tested the Triplet Extraction (TE) capabilities of a variety of Large Language Models (LLMs) of different sizes in the Zero- and Few-Shots settings. In detail, we proposed a pipeline that dynamically gathers contextual information from a Knowledge Base (KB), both in the form of context triplets and of (sentence, triplets) pairs as examples, and provides it to the LLM through a prompt. The additional context allowed the LLMs to be competitive with all the older fully trained baselines based on the Bidirectional Long Short-Term Memory (BiLSTM) Network architecture. We further conducted a detailed analysis of the quality of the gathered KB context, finding it to be strongly correlated with the final TE performance of the model. In contrast, the size of the model appeared to only logarithmically improve the TE capabilities of the LLMs. We release the code on GitHub for reproducibility.

pdf bib abs

Tiny But Mighty: A Crowdsourced Benchmark Dataset for Triple Extraction from Unstructured Text
Muhammad Salman | Armin Haller | Sergio J. Rodriguez Mendez | Usman Naseem
Proceedings of the 20th Joint ACL - ISO Workshop on Interoperable Semantic Annotation @ LREC-COLING 2024

In the context of Natural Language Processing (NLP) and Semantic Web applications, constructing Knowledge Graphs (KGs) from unstructured text plays a vital role. Several techniques have been developed for KG construction from text, but the lack of standardized datasets hinders the evaluation of triple extraction methods. The evaluation of existing KG construction approaches is based on structured data or manual investigations. To overcome this limitation, this work introduces a novel dataset specifically designed to evaluate KG construction techniques from unstructured text. Our dataset consists of a diverse collection of compound and complex sentences meticulously annotated by human annotators with potential triples (subject, verb, object). The annotations underwent further scrutiny by expert ontologists to ensure accuracy and consistency. For evaluation purposes, the proposed F-measure criterion offers a robust approach to quantify the relatedness and assess the alignment between extracted triples and the ground-truth triples, providing a valuable tool for evaluating the performance of triple extraction systems. By providing a diverse collection of high-quality triples, our proposed benchmark dataset offers a comprehensive training and evaluation set for refining the performance of state-of-the-art language models on a triple extraction task. Furthermore, this dataset encompasses various KG-related tasks, such as named entity recognition, relation extraction, and entity linking.

pdf bib

Proceedings of the 22nd Annual Workshop of the Australasian Language Technology Association
Tim Baldwin | Sergio José Rodríguez Méndez | Nicholas Kuo
Proceedings of the 22nd Annual Workshop of the Australasian Language Technology Association

2023

Venues

WIESP1

Fix author