Benjamin Weitz


pdf bib
Combining OCR Outputs for Logical Document Structure Markup. Technical Background to the ACL 2012 Contributed Task
Ulrich Schäfer | Benjamin Weitz
Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries

pdf bib
A Graphical Citation Browser for the ACL Anthology
Benjamin Weitz | Ulrich Schäfer
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Navigation in large scholarly paper collections is tedious and not well supported in most scientific digital libraries. We describe a novel browser-based graphical tool implemented using HTML5 Canvas. It displays citation information extracted from the paper text to support useful navigation. The tool is implemented using a client/server architecture. A citation graph of the digital library is built in the memory of the server. On the client side, egdes of the displayed citation (sub)graph surrounding a document are labeled with keywords signifying the kind of citation made from one document to another. These keywords were extracted using NLP tools such as tokenizer, sentence boundary detection and part-of-speech tagging applied to the text extracted from the original PDF papers (currently 22,500). By clicking on an egde, the user can inspect the corresponding citation sentence in context, in most cases even also highlighted in the original PDF layout. The system is publicly accessible as part of the ACL Anthology Searchbench.