Paul Tarau


2022

pdf bib
Textstar: a Fast and Lightweight Graph-Based Algorithm for Extractive Summarization and Keyphrase Extraction
David Brock | Ali Khan | Tam Doan | Alicia Lin | Yifan Guo | Paul Tarau
Proceedings of the 20th Annual Workshop of the Australasian Language Technology Association

We introduce Textstar, a graph-based summarization and keyphrase extraction system that builds a document graph using only lemmatization and POS tagging. The document graph aggregates connections between lemma and sentence identifier nodes. Consecutive lemmas in each sentence, as well as consecutive sentences themselves, are connected in rings to form a ring of rings representing the document. We iteratively apply a centrality algorithm of our choice to the document graph and trim the lowest ranked nodes at each step. After the desired number of remaining sentences and lemmas is reached, we extract the sentences as the summary, and the remaining lemmas are aggregated into keyphrases using their context. Our algorithm is efficient enough to one-shot process large document graphs without any training, and empirical evaluation on several benchmarks indicates that our performance is higher than most other graph based algorithms.

2016

pdf bib
Infusing NLU into Automatic Question Generation
Karen Mazidi | Paul Tarau
Proceedings of the 9th International Natural Language Generation conference

2005

pdf bib
A Language Independent Algorithm for Single and Multiple Document Summarization
Rada Mihalcea | Paul Tarau
Companion Volume to the Proceedings of Conference including Posters/Demos and tutorial abstracts

2004

pdf bib
TextRank: Bringing Order into Text
Rada Mihalcea | Paul Tarau
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

pdf bib
PageRank on Semantic Networks, with Application to Word Sense Disambiguation
Rada Mihalcea | Paul Tarau | Elizabeth Figa
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics