2014
pdf
bib
abs
Thomas Aquinas in the TüNDRA: Integrating the Index Thomisticus Treebank into CLARIN-D
Scott Martens
|
Marco Passarotti
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
This paper describes the integration of the Index Thomisticus Treebank (IT-TB) into the web-based treebank search and visualization application TueNDRA (Tuebingen aNnotated Data Retrieval & Analysis). TueNDRA was originally designed to provide access via the Internet to constituency treebanks and to tools for searching and visualizing them, as well as tabulating statistics about their contents. TueNDRA has now been extended to also provide full support for dependency treebanks with non-projective dependencies, in order to integrate the IT-TB and future treebanks with similar properties. These treebanks are queried using an adapted form of the TIGERSearch query language, which can search both hierarchical and sequential information in treebanks in a single query. As a web application, making the IT-TB accessible via TueNDRA makes the treebank and the tools to use of it available to a large community without having to distribute software and show users how to install it.
2012
pdf
bib
abs
Large aligned treebanks for syntax-based machine translation
Gideon Kotzé
|
Vincent Vandeghinste
|
Scott Martens
|
Jörg Tiedemann
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
We present a collection of parallel treebanks that have been automatically aligned on both the terminal and the nonterminal constituent level for use in syntax-based machine translation. We describe how they were constructed and applied to a syntax- and example-based machine translation system called Parse and Corpus-Based Machine Translation (PaCo-MT). For the language pair Dutch to English, we present evaluation scores of both the nonterminal constituent alignments and the MT system itself, and in the latter case, compare them with those of Moses, a current state-of-the-art statistical MT system, when trained on the same data.
2010
pdf
bib
An Efficient, Generic Approach to Extracting Multi-Word Expressions from Dependency Trees
Scott Martens
|
Vincent Vandeghinste
Proceedings of the 2010 Workshop on Multiword Expressions: from Theory to Applications
pdf
bib
Bottom-up Transfer in Example-based Machine Translation
Vincent Vandeghinste
|
Scott Martens
Proceedings of the 14th Annual Conference of the European Association for Machine Translation
pdf
bib
Varro: An Algorithm and Toolkit for Regular Structure Discovery in Treebanks
Scott Martens
Coling 2010: Posters
2009
pdf
bib
Quantitative analysis of treebanks using frequent subtree mining methods
Scott Martens
Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing (TextGraphs-4)