Atro Voutilainen


pdf bib
Analysing Finnish with word lists: the DDI approach to morphology revisited
Atro Voutilainen | Maria Palolahti
Proceedings of the Fourth International Workshop on Computational Linguistics of Uralic Languages


pdf bib
Improving corpus annotation productivity: a method and experiment with interactive tagging
Atro Voutilainen
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Corpus linguistic and language technological research needs empirical corpus data with nearly correct annotation and high volume to enable advances in language modelling and theorising. Recent work on improving corpus annotation accuracy presents semiautomatic methods to correct some of the analysis errors in available annotated corpora, while leaving the remaining errors undetected in the annotated corpus. We review recent advances in linguistics-based partial tagging and parsing, and regard the achieved analysis performance as sufficient for reconsidering a previously proposed method: combining nearly correct but partial automatic analysis with a minimal amount of human postediting (disambiguation) to achieve nearly correct corpus annotation accuracy at a competitive annotation speed. We report a pilot experiment with morphological (part-of-speech) annotation using a partial linguistic tagger of a kind previously reported with a very attractive precision-recall ratio, and observe that a desired level of annotation accuracy can be reached by using human disambiguation for less than 10\% of the words in the corpus.

pdf bib
Specifying Treebanks, Outsourcing Parsebanks: FinnTreeBank 3
Atro Voutilainen | Kristiina Muhonen | Tanja Purtonen | Krister Lindén
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Corpus-based treebank annotation is known to result in incomplete coverage of mid- and low-frequency linguistic constructions: the linguistic representation and corpus annotation quality are sometimes suboptimal. Large descriptive grammars cover also many mid- and low-frequency constructions. We argue for use of large descriptive grammars and their sample sentences as a basis for specifying higher-coverage grammatical representations. We present an sample case from an ongoing project (FIN-CLARIN FinnTreeBank) where an grammatical representation is documented as an annotator's manual alongside manual annotation of sample sentences extracted from a large descriptive grammar of Finnish. We outline the linguistic representation (morphology and dependency syntax) for Finnish, and show how the resulting `Grammar Definition Corpus' and the documentation is used as a task specification for an external subcontractor for building a parser engine for use in morphological and dependency syntactic analysis of large volumes of Finnish for parsebanking purposes. The resulting corpus, FinnTreeBank 3, is due for release in June 2012, and will contain tens of millions of words from publicly available corpora of Finnish with automatic morphological and dependency syntactic analysis, for use in research on the corpus linguistics and language engineering.

pdf bib
Refining the Design of a Contracting Finite-State Dependency Parser
Anssi Yli-Jyrä | Jussi Piitulainen | Atro Voutilainen
Proceedings of the 10th International Workshop on Finite State Methods and Natural Language Processing


pdf bib
A double-blind experiment on interannotator agreement: the case of dependency syntax and Finnish
Atro Voutilainen | Tanja Purtonen
Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011)


pdf bib
Parsing Swedish
Atro Voutilainen
Proceedings of the 13th Nordic Conference of Computational Linguistics (NODALIDA 2001)


pdf bib
An experiment on the upper bound of interjudge agreement: the case of tagging
Atro Voutilainen
Ninth Conference of the European Chapter of the Association for Computational Linguistics


pdf bib
Towards a single proposal in spelling correction
Eneko Agirre | Koldo Gojenola | Kepa Sarasola | Atro Voutilainen
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics

pdf bib
Towards a Single Proposal in Spelling Correction
Eneko Agirre | Koldo Gojenola | Kepa Sarasola | Atro Voutilainen
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1

pdf bib
Does tagging help parsing? A case study on finite state parsing
Atro Voutilainen
Finite State Methods in Natural Language Processing

pdf bib
Proceedings of the Third Conference on Empirical Methods for Natural Language Processing
Nancy Ide | Atro Voutilainen
Proceedings of the Third Conference on Empirical Methods for Natural Language Processing


pdf bib
Developing a hybrid NP parser
Atro Voutilainen | Lluis Padro
Fifth Conference on Applied Natural Language Processing

pdf bib
Comparing a Linguistic and a Stochastic Tagger
Christer Samuelsson | Atro Voutilainen
35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics


pdf bib
A syntax-based part-of-speech analyser
Atro Voutilainen
Seventh Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Specifying a shallow grammatical representation for parsing purposes
Atro Voutilainen | Timo Jarvinen
Seventh Conference of the European Chapter of the Association for Computational Linguistics


pdf bib
Tagging accurately - Don’t guess if you know
Pasi Tapanainen | Atro Voutilainen
Fourth Conference on Applied Natural Language Processing

pdf bib
A Noun Phrase Parser of English
Atro Voutilainen
Proceedings of the 9th Nordic Conference of Computational Linguistics (NODALIDA 1993)


pdf bib
Ambiguity resolution in a reductionistic parser
Atro Voutilainen | Pasi Tapanainen
Sixth Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Ambiguity resolution in a reductionistic parser
Pasi Tapanainen | Atro Voutilainen
Sixth Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
NPtool, a Detector of English Noun Phrases
Atro Voutilainen
Very Large Corpora: Academic and Industrial Perspectives


pdf bib
Compiling and Using Finite-State Syntactic Rules
Kimmo Koskenniemi | Pasi Tapanainen | Atro Voutilainen
COLING 1992 Volume 1: The 14th International Conference on Computational Linguistics