Mohammed Sabry


2024

pdf bib
Filling Gaps in Wikipedia: Leveraging Data-to-Text Generation to Improve Encyclopedic Coverage of Underrepresented Groups
Simon Mille | Massimiliano Pronesti | Craig Thomson | Michela Lorandi | Sophie Fitzpatrick | Rudali Huidrom | Mohammed Sabry | Amy O’Riordan | Anya Belz
Proceedings of the 17th International Natural Language Generation Conference: System Demonstrations

Wikipedia is known to have systematic gaps in its coverage that correspond to under-resourced languages as well as underrepresented groups. This paper presents a new tool to support efforts to fill in these gaps by automatically generating draft articles and facilitating post-editing and uploading to Wikipedia. A rule-based generator and an input-constrained LLM are used to generate two alternative articles, enabling the often more fluent, but error-prone, LLM-generated article to be content-checked against the more reliable, but less fluent, rule-generated article.

pdf bib
DCU-NLG-Small at the GEM’24 Data-to-Text Task: Rule-based generation and post-processing with T5-Base
Simon Mille | Mohammed Sabry | Anya Belz
Proceedings of the 17th International Natural Language Generation Conference: Generation Challenges

Our submission to the GEM data-to-text shared task aims to assess the quality of texts produced by the combination of a rule-based system with a language model of reduced size, by first using a rule-based generator to convert input triples into semantically correct English text, and then a language model to paraphrase these texts to make them more fluent. The texts are translated to languages other than English with the NLLB machine translation system.