Mohammed Sabry
2024
Filling Gaps in Wikipedia: Leveraging Data-to-Text Generation to Improve Encyclopedic Coverage of Underrepresented Groups
Simon Mille
|
Massimiliano Pronesti
|
Craig Thomson
|
Michela Lorandi
|
Sophie Fitzpatrick
|
Rudali Huidrom
|
Mohammed Sabry
|
Amy O’Riordan
|
Anya Belz
Proceedings of the 17th International Natural Language Generation Conference: System Demonstrations
Wikipedia is known to have systematic gaps in its coverage that correspond to under-resourced languages as well as underrepresented groups. This paper presents a new tool to support efforts to fill in these gaps by automatically generating draft articles and facilitating post-editing and uploading to Wikipedia. A rule-based generator and an input-constrained LLM are used to generate two alternative articles, enabling the often more fluent, but error-prone, LLM-generated article to be content-checked against the more reliable, but less fluent, rule-generated article.
DCU-NLG-Small at the GEM’24 Data-to-Text Task: Rule-based generation and post-processing with T5-Base
Simon Mille
|
Mohammed Sabry
|
Anya Belz
Proceedings of the 17th International Natural Language Generation Conference: Generation Challenges
Our submission to the GEM data-to-text shared task aims to assess the quality of texts produced by the combination of a rule-based system with a language model of reduced size, by first using a rule-based generator to convert input triples into semantically correct English text, and then a language model to paraphrase these texts to make them more fluent. The texts are translated to languages other than English with the NLLB machine translation system.
Search
Co-authors
- Simon Mille 2
- Anja Belz 2
- Massimiliano Pronesti 1
- Craig Thomson 1
- Michela Lorandi 1
- show all...
Venues
- inlg2