UnixMan Corpus: A Resource for Language Learning in the Unix Domain

Kyle Richardson, Jonas Kuhn


Abstract
We present a new resource, the UnixMan Corpus, for studying language learning it the domain of Unix utility manuals. The corpus is built by mining Unix (and other Unix related) man pages for parallel example entries, consisting of English textual descriptions with corresponding command examples. The commands provide a grounded and ambiguous semantics for the textual descriptions, making the corpus of interest to work on Semantic Parsing and Grounded Language Learning. In contrast to standard resources for Semantic Parsing, which tend to be restricted to a small number of concepts and relations, the UnixMan Corpus spans a wide variety of utility genres and topics, and consists of hundreds of command and domain entity types. The semi-structured nature of the manuals also makes it easy to exploit other types of relevant information for Grounded Language Learning. We describe the details of the corpus and provide preliminary classification results.
Anthology ID:
L14-1635
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2985–2989
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/823_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Kyle Richardson and Jonas Kuhn. 2014. UnixMan Corpus: A Resource for Language Learning in the Unix Domain. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 2985–2989, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
UnixMan Corpus: A Resource for Language Learning in the Unix Domain (Richardson & Kuhn, LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/823_Paper.pdf