Eros Zanchetta


2008

pdf bib
Introducing, evaluating ukWaC, a very large web-derived corpus of English
Adriano Ferraresi | Eros Zanchetta | Marco Baroni | Silvia Bernardini
Proceedings of the 4th Web as Corpus Workshop

In this paper we introduce ukWaC, a large corpus of English constructed by crawling the .uk Internet domain. The corpus contains more than 2 billion tokens, is one of the largest freely available linguistic resources for English. The paper describes the tools, methodology used in the construction of the corpus, provides a qualitative evaluation of its contents, carried out through a vocabulary-based comparison with the BNC. We conclude by giving practical information about availability, format of the corpus.

2004

pdf bib
XTERM: A Flexible Standard-Compliant XML-Based Termbase Management System
Lorenzo Piccioni | Eros Zanchetta
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)