Joel Nothman


2018

pdf bib
Stop Word Lists in Free Open-source Software Packages
Joel Nothman | Hanmin Qin | Roman Yurchak
Proceedings of Workshop for NLP Open Source Software (NLP-OSS)

Open-source software packages for language processing often include stop word lists. Users may apply them without awareness of their surprising omissions (e.g. “hasn’t” but not “hadn’t”) and inclusions (“computer”), or their incompatibility with a particular tokenizer. Motivated by issues raised about the Scikit-learn stop list, we investigate variation among and consistency within 52 popular English-language stop lists, and propose strategies for mitigating these issues.

2017

pdf bib
Cross-lingual Name Tagging and Linking for 282 Languages
Xiaoman Pan | Boliang Zhang | Jonathan May | Joel Nothman | Kevin Knight | Heng Ji
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The ambitious goal of this work is to develop a cross-lingual name tagging and linking framework for 282 languages that exist in Wikipedia. Given a document in any of these languages, our framework is able to identify name mentions, assign a coarse-grained or fine-grained type to each mention, and link it to an English Knowledge Base (KB) if it is linkable. We achieve this goal by performing a series of new KB mining methods: generating “silver-standard” annotations by transferring annotations from English to other languages through cross-lingual links and KB properties, refining annotations through self-training and topic selection, deriving language-specific morphology features from anchor links, and mining word translation pairs from cross-lingual links. Both name tagging and linking results for 282 languages are promising on Wikipedia data and on-Wikipedia data.

pdf bib
English Event Detection With Translated Language Features
Sam Wei | Igor Korostil | Joel Nothman | Ben Hachey
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We propose novel radical features from automatic translation for event extraction. Event detection is a complex language processing task for which it is expensive to collect training data, making generalisation challenging. We derive meaningful subword features from automatic translations into target language. Results suggest this method is particularly useful when using languages with writing systems that facilitate easy decomposition into subword features, e.g., logograms and Cangjie. The best result combines logogram features from Chinese and Japanese with syllable features from Korean, providing an additional 3.0 points f-score when added to state-of-the-art generalisation features on the TAC KBP 2015 Event Nugget task.

2016

pdf bib
Using mention accessibility to improve coreference resolution
Kellie Webster | Joel Nothman
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2015

pdf bib
Understanding engagement with insurgents through retweet rhetoric
Joel Nothman | Atif Ahmad | Christoph Breidbach | David Malet | Timothy Baldwin
Proceedings of the Australasian Language Technology Association Workshop 2015

2014

pdf bib
Trading accuracy for faster named entity linking
Kristy Hughes | Joel Nothman | James R. Curran
Proceedings of the Australasian Language Technology Association Workshop 2014

pdf bib
Unsupervised Biographical Event Extraction Using Wikipedia Traffic
Alexander Hogue | Joel Nothman | James R. Curran
Proceedings of the Australasian Language Technology Association Workshop 2014

pdf bib
Analysing recall loss in named entity slot filling
Glen Pink | Joel Nothman | James R. Curran
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Command-line utilities for managing and exploring annotated corpora
Joel Nothman | Tim Dawborn | James R. Curran
Proceedings of the Workshop on Open Infrastructures and Analysis Frameworks for HLT

pdf bib
Cheap and easy entity evaluation
Ben Hachey | Joel Nothman | Will Radford
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf bib
Event Linking: Grounding Event Reference in a News Archive
Joel Nothman | Matthew Honnibal | Ben Hachey | James R. Curran
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2009

pdf bib
Analysing Wikipedia and Gold-Standard Corpora for NER Training
Joel Nothman | Tara Murphy | James R. Curran
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

pdf bib
Named Entity Recognition in Wikipedia
Dominic Balasuriya | Nicky Ringland | Joel Nothman | Tara Murphy | James R. Curran
Proceedings of the 2009 Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources (People’s Web)

pdf bib
Evaluating a Statistical CCG Parser on Wikipedia
Matthew Honnibal | Joel Nothman | James R. Curran
Proceedings of the 2009 Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources (People’s Web)

pdf bib
Classifying articles in English and German Wikipedia
Nicky Ringland | Joel Nothman | Tara Murphy | James R. Curran
Proceedings of the Australasian Language Technology Association Workshop 2009

2008

pdf bib
Transforming Wikipedia into Named Entity Training Data
Joel Nothman | James R. Curran | Tara Murphy
Proceedings of the Australasian Language Technology Association Workshop 2008

2005

pdf bib
A Distributed Architecture for Interactive Parse Annotation
Baden Hughes | James Haggerty | Joel Nothman | Saritha Manickam | James R. Curran
Proceedings of the Australasian Language Technology Workshop 2005