PDFdigest: an Adaptable Layout-Aware PDF-to-XML Textual Content Extractor for Scientific Articles

Daniel Ferrés, Horacio Saggion, Francesco Ronzano, Àlex Bravo


Anthology ID:
L18-1298
Volume:
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Month:
May
Year:
2018
Address:
Miyazaki, Japan
Editors:
Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, Takenobu Tokunaga
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
https://aclanthology.org/L18-1298
DOI:
Bibkey:
Cite (ACL):
Daniel Ferrés, Horacio Saggion, Francesco Ronzano, and Àlex Bravo. 2018. PDFdigest: an Adaptable Layout-Aware PDF-to-XML Textual Content Extractor for Scientific Articles. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA).
Cite (Informal):
PDFdigest: an Adaptable Layout-Aware PDF-to-XML Textual Content Extractor for Scientific Articles (Ferrés et al., LREC 2018)
Copy Citation:
PDF:
https://aclanthology.org/L18-1298.pdf