Finding Viable Seed URLs for Web Corpora: A Scouting Approach and Comparative Study of Available Sources

Adrien Barbaresi


Anthology ID:
W14-0401
Volume:
Proceedings of the 9th Web as Corpus Workshop (WaC-9)
Month:
April
Year:
2014
Address:
Gothenburg, Sweden
Editors:
Felix Bildhauer, Roland Schäfer
Venue:
WAC
SIG:
SIGWAC
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–8
Language:
URL:
https://aclanthology.org/W14-0401
DOI:
10.3115/v1/W14-0401
Bibkey:
Cite (ACL):
Adrien Barbaresi. 2014. Finding Viable Seed URLs for Web Corpora: A Scouting Approach and Comparative Study of Available Sources. In Proceedings of the 9th Web as Corpus Workshop (WaC-9), pages 1–8, Gothenburg, Sweden. Association for Computational Linguistics.
Cite (Informal):
Finding Viable Seed URLs for Web Corpora: A Scouting Approach and Comparative Study of Available Sources (Barbaresi, WAC 2014)
Copy Citation:
PDF:
https://aclanthology.org/W14-0401.pdf
Code
 adbar/flux-toolchain