Steven Bird


2024

pdf bib
Must NLP be Extractive?
Steven Bird
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

How do we roll out language technologies across a world with 7,000 languages? In one story, we scale the successes of NLP further into ‘low-resource’ languages, doing ever more with less. However, this approach does not recognise the fact that, beyond the 500 institutional languages, the remaining languages are oral vernaculars spoken by communities who use a language of wider communication to interact with the outside world. I argue that such ‘contact languages’ are the appropriate target for technologies like machine translation, and that the 6,500 oral languages must be approached differently. I share a story from an Indigenous community, where local people reshaped an extractive agenda to align with their relational agenda. I describe the emerging paradigm of relational NLP and explain how it opens the way to non-extractive methods and to solutions that enhance human agency.

pdf bib
Envisioning NLP for intercultural climate communication
Steven Bird | Angelina Aquino | Ian Gumbula
Proceedings of the 1st Workshop on Natural Language Processing Meets Climate Change (ClimateNLP 2024)

Climate communication is often seen by the NLP community as an opportunity for machine translation, applied to ever smaller languages. However, over 90% the world’s linguistic diversity comes from languages with ‘primary orality’ and mostly spoken in non-Western oral societies. A case in point is the Aboriginal communities of Northern Australia, where we have been conducting workshops on climate communication, revealing shortcomings in existing communication practices along with new opportunities for improving intercultural communication. We present a case study of climate communication in an oral society, including the voices of many local people, and draw several lessons for the research program of NLP in the climate space.

pdf bib
Centering the Speech Community
Steven Bird | Dean Yibarbuk
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

How can NLP/AI practitioners engage with oral societies and develop locally appropriate language technologies? We report on our experience of working together over five years in a remote community in the far north of Australia, and how we prototyped simple language technologies to support our collaboration. We navigated different understandings of language, the functional differentiation of oral vs institutional languages, and the distinct technology opportunities for each. Our collaboration unsettled the first author’s western framing of language as data for exploitation by machines, and we devised a design pattern that seems better aligned with local interests and aspirations. We call for new collaborations on the design of locally appropriate technologies for languages with primary orality.

2022

pdf bib
Learning From Failure: Data Capture in an Australian Aboriginal Community
Eric Le Ferrand | Steven Bird | Laurent Besacier
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Most low resource language technology development is premised on the need to collect data for training statistical models. When we follow the typical process of recording and transcribing text for small Indigenous languages, we hit up against the so-called “transcription bottleneck.” Therefore it is worth exploring new ways of engaging with speakers which generate data while avoiding the transcription bottleneck. We have deployed a prototype app for speakers to use for confirming system guesses in an approach to transcription based on word spotting. However, in the process of testing the app we encountered many new problems for engagement with speakers. This paper presents a close-up study of the process of deploying data capture technology on the ground in an Australian Aboriginal community. We reflect on our interactions with participants and draw lessons that apply to anyone seeking to develop methods for language data collection in an Indigenous community.

pdf bib
Local Languages, Third Spaces, and other High-Resource Scenarios
Steven Bird
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

How can language technology address the diverse situations of the world’s languages? In one view, languages exist on a resource continuum and the challenge is to scale existing solutions, bringing under-resourced languages into the high-resource world. In another view, presented here, the world’s language ecology includes standardised languages, local languages, and contact languages. These are often subsumed under the label of “under-resourced languages” even though they have distinct functions and prospects. I explore this position and propose some ecologically-aware language technology agendas.

pdf bib
Fashioning Local Designs from Generic Speech Technologies in an Australian Aboriginal Community
Éric Le Ferrand | Steven Bird | Laurent Besacier
Proceedings of the 29th International Conference on Computational Linguistics

An increasing number of papers have been addressing issues related to low-resource languages and the transcription bottleneck paradigm. After several years spent in Northern Australia, where some of the strongest Aboriginal languages are spoken, we could observe a gap between the motivations depicted in research contributions in this space and the Northern Australian context. In this paper, we address this gap in research by exploring the potential of speech recognition in an Aboriginal community. We describe our work from training a spoken term detection system to its implementation in an activity with Aboriginal participants. We report here on one side how speech recognition technologies can find their place in an Aboriginal context and, on the other, methodological paths that allowed us to reach better comprehension and engagement from Aboriginal participants.

pdf bib
Learning Through Transcription
Mat Bettinson | Steven Bird
Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages

Transcribing speech for primarily oral, local languages is often a joint effort involving speakers and outsiders. It is commonly motivated by externally-defined scientific goals, alongside local motivations such as language acquisition and access to heritage materials. We explore the task of ‘learning through transcription’ through the design of a system for collaborative speech annotation. We have developed a prototype to support local and remote learner-speaker interactions in remote Aboriginal communities in northern Australia. We show that situated systems design for inclusive non-expert practice is a promising new direction for working with speakers of local languages.

pdf bib
A Finite State Aproach to Interactive Transcription
William Lane | Steven Bird
Proceedings of the first workshop on NLP applications to field linguistics

We describe a novel approach to transcribing morphologically complex, local, oral languages. The approach connects with local motivations for participating in language work which center on language learning, accessing the content of audio collections, and applying this knowledge in language revitalization and maintenance. We develop a constraint-based approach to interactive word completion, expressed using Optimality Theoretic constraints, implemented in a finite state transducer, and applied to an Indigenous language. We show that this approach suggests correct full word predictions on 57.9% of the test utterances, and correct partial word predictions on 67.5% of the test utterances. In total, 87% of the test utterances receive full or partial word suggestions which serve to guide the interactive transcription process.

pdf bib
Multiword Expressions and the Low-Resource Scenario from the Perspective of a Local Oral Culture
Steven Bird
Proceedings of the 18th Workshop on Multiword Expressions @LREC2022

Research on multiword expressions and on under-resourced languages often begins with problematisation. The existence of non-compositional meaning, or the paucity of conventional language resources, are treated as problems to be solved. This perspective is associated with the view of Language as a lexico-grammatical code, and of NLP as a conventional sequence of computational tasks. In this talk, I share from my experience in an Australian Aboriginal community, where people tend to see language as an expression of identity and of ‘connection to country’. Here, my early attempts to collect language data were thwarted. There was no obvious role for tasks like speech recognition, parsing, or translation. Instead, working under the authority of local elders, I pivoted to language processing tasks that were more in keeping with local interests and aspirations. I describe these tasks and suggest some new ways of framing the work of NLP, and I explore implications for work on multiword expressions and on under-resourced languages.

2021

pdf bib
Phone Based Keyword Spotting for Transcribing Very Low Resource Languages
Eric Le Ferrand | Steven Bird | Laurent Besacier
Proceedings of the 19th Annual Workshop of the Australasian Language Technology Association

We investigate the efficiency of two very different spoken term detection approaches for transcription when the available data is insufficient to train a robust speech recognition system. This work is grounded in a very low-resource language documentation scenario where only a few minutes of recording have been transcribed for a given language so far. Experiments on two oral languages show that a pretrained universal phone recognizer, fine-tuned with only a few minutes of target language speech, can be used for spoken term detection through searches in phone confusion networks with a lexicon expressed as a finite state automaton. Experimental results show that a phone recognition based approach provides better overall performances than Dynamic Time Warping when working with clean data, and highlight the benefits of each methods for two types of speech corpus.

pdf bib
A Computational Model for Interactive Transcription
William Lane | Mat Bettinson | Steven Bird
Proceedings of the Second Workshop on Data Science with Human in the Loop: Language Advances

Transcribing low resource languages can be challenging in the absence of a good lexicon and trained transcribers. Accordingly, we seek a way to enable interactive transcription whereby the machine amplifies human efforts. This paper presents a data model and a system architecture for interactive transcription, supporting multiple modes of interactivity, increasing the likelihood of finding tasks that engage local participation in language work. The approach also supports other applications which are useful in our context, including spoken document retrieval and language learning.

pdf bib
Local Word Discovery for Interactive Transcription
William Lane | Steven Bird
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Human expertise and the participation of speech communities are essential factors in the success of technologies for low-resource languages. Accordingly, we propose a new computational task which is tuned to the available knowledge and interests in an Indigenous community, and which supports the construction of high quality texts and lexicons. The task is illustrated for Kunwinjku, a morphologically-complex Australian language. We combine a finite state implementation of a published grammar with a partial lexicon, and apply this to a noisy phone representation of the signal. We locate known lexemes in the signal and use the morphological transducer to build these out into hypothetical, morphologically-complex words for human validation. We show that applying a single iteration of this method results in a relative transcription density gain of 17%. Further, we find that 75% of breath groups in the test set receive at least one correct partial or full-word suggestion.

2020

pdf bib
Bootstrapping Techniques for Polysynthetic Morphological Analysis
William Lane | Steven Bird
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Polysynthetic languages have exceptionally large and sparse vocabularies, thanks to the number of morpheme slots and combinations in a word. This complexity, together with a general scarcity of written data, poses a challenge to the development of natural language technologies. To address this challenge, we offer linguistically-informed approaches for bootstrapping a neural morphological analyzer, and demonstrate its application to Kunwinjku, a polysynthetic Australian language. We generate data from a finite state transducer to train an encoder-decoder model. We improve the model by “hallucinating” missing linguistic structure into the training data, and by resampling from a Zipf distribution to simulate a more natural distribution of morphemes. The best model accounts for all instances of reduplication in the test set and achieves an accuracy of 94.7% overall, a 10 percentage point improvement over the FST baseline. This process demonstrates the feasibility of bootstrapping a neural morph analyzer from minimal resources.

pdf bib
Sparse Transcription
Steven Bird
Computational Linguistics, Volume 46, Issue 4 - December 2020

The transcription bottleneck is often cited as a major obstacle for efforts to document the world’s endangered languages and supply them with language technologies. One solution is to extend methods from automatic speech recognition and machine translation, and recruit linguists to provide narrow phonetic transcriptions and sentence-aligned translations. However, I believe that these approaches are not a good fit with the available data and skills, or with long-established practices that are essentially word-based. In seeking a more effective approach, I consider a century of transcription practice and a wide range of computational approaches, before proposing a computational model based on spoken term detection that I call “sparse transcription.” This represents a shift away from current assumptions that we transcribe phones, transcribe fully, and transcribe first. Instead, sparse transcription combines the older practice of word-level transcription with interpretive, iterative, and interactive processes that are amenable to wider participation and that open the way to new methods for processing oral languages.

pdf bib
Enabling Interactive Transcription in an Indigenous Community
Eric Le Ferrand | Steven Bird | Laurent Besacier
Proceedings of the 28th International Conference on Computational Linguistics

We propose a novel transcription workflow which combines spoken term detection and human-in-the-loop, together with a pilot experiment. This work is grounded in an almost zero-resource scenario where only a few terms have so far been identified, involving two endangered languages. We show that in the early stages of transcription, when the available data is insufficient to train a robust ASR system, it is possible to take advantage of the transcription of a small number of isolated words in order to bootstrap the transcription of a speech collection.

pdf bib
Decolonising Speech and Language Technology
Steven Bird
Proceedings of the 28th International Conference on Computational Linguistics

After generations of exploitation, Indigenous people often respond negatively to the idea that their languages are data ready for the taking. By treating Indigenous knowledge as a commodity, speech and language technologists risk disenfranchising local knowledge authorities, reenacting the causes of language endangerment. Scholars in related fields have responded to calls for decolonisation, and we in the speech and language technology community need to follow suit, and explore what this means for our practices that involve Indigenous languages and the communities who own them. This paper reviews colonising discourses in speech and language technology, and suggests new ways of working with Indigenous communities, and seeks to open a discussion of a postcolonial approach to computational methods for supporting language vitality.

pdf bib
Interactive Word Completion for Morphologically Complex Languages
William Lane | Steven Bird
Proceedings of the 28th International Conference on Computational Linguistics

Text input technologies for low-resource languages support literacy, content authoring, and language learning. However, tasks such as word completion pose a challenge for morphologically complex languages thanks to the combinatorial explosion of possible words. We have developed a method for morphologically-aware text input in Kunwinjku, a polysynthetic language of northern Australia. We modify an existing finite state recognizer to map input morph prefixes to morph completions, respecting the morphosyntax and morphophonology of the language. We demonstrate the portability of the method by applying it to Turkish. We show that the space of proximal morph completions is many orders of magnitude smaller than the space of full word completions for Kunwinjku. We provide a visualization of the morph completion space to enable the text completion parameters to be fine-tuned. Finally, we report on a web services deployment, along with a web interface which helps users enter morphologically complex words and which retrieves corresponding entries from the lexicon.

2019

pdf bib
Towards A Robust Morphological Analyzer for Kunwinjku
William Lane | Steven Bird
Proceedings of the 17th Annual Workshop of the Australasian Language Technology Association

Kunwinjku is an indigenous Australian language spoken in northern Australia which exhibits agglutinative and polysynthetic properties. Members of the community have expressed interest in co-developing language applications that promote their values and priorities. Modeling the morphology of the Kunwinjku language is an important step towards accomplishing the community’s goals. Finite State Transducers have long been the go-to method for modeling morphologically rich languages, and in this paper we discuss some of the distinct modeling challenges present in the morphosyntax of verbs in Kunwinjku. We show that a fairly straightforward implementation using standard features of the foma toolkit can account for much of the verb structure. Continuing challenges include robustness in the face of variation and unseen vocabulary, as well as how to handle complex reduplicative processes. Our future work will build off the baseline and challenges presented here.

2018

pdf bib
Evaluation Phonemic Transcription of Low-Resource Tonal Languages for Language Documentation
Oliver Adams | Trevor Cohn | Graham Neubig | Hilaria Cruz | Steven Bird | Alexis Michaud
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Multilingual Training of Crosslingual Word Embeddings
Long Duong | Hiroshi Kanayama | Tengfei Ma | Steven Bird | Trevor Cohn
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Crosslingual word embeddings represent lexical items from different languages using the same vector space, enabling crosslingual transfer. Most prior work constructs embeddings for a pair of languages, with English on one side. We investigate methods for building high quality crosslingual word embeddings for many languages in a unified vector space. In this way, we can exploit and combine strength of many languages. We obtained high performance on bilingual lexicon induction, monolingual similarity and crosslingual document classification tasks.

pdf bib
Cross-Lingual Word Embeddings for Low-Resource Language Modeling
Oliver Adams | Adam Makarucha | Graham Neubig | Steven Bird | Trevor Cohn
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Most languages have no established writing system and minimal written records. However, textual data is essential for natural language processing, and particularly important for training language models to support speech recognition. Even in cases where text data is missing, there are some languages for which bilingual lexicons are available, since creating lexicons is a fundamental task of documentary linguistics. We investigate the use of such lexicons to improve language models when textual training data is limited to as few as a thousand sentences. The method involves learning cross-lingual word embeddings as a preliminary step in training monolingual language models. Results across a number of languages show that language models are improved by this pre-training. Application to Yongning Na, a threatened language, highlights challenges in deploying the approach in real low-resource environments.

pdf bib
Developing a Suite of Mobile Applications for Collaborative Language Documentation
Mat Bettinson | Steven Bird
Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages

2016

pdf bib
Learning Crosslingual Word Embeddings without Bilingual Corpora
Long Duong | Hiroshi Kanayama | Tengfei Ma | Steven Bird | Trevor Cohn
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Learning a Lexicon and Translation Model from Phoneme Lattices
Oliver Adams | Graham Neubig | Trevor Cohn | Steven Bird | Quoc Truong Do | Satoshi Nakamura
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
An Attentional Model for Speech Translation Without Transcription
Long Duong | Antonios Anastasopoulos | David Chiang | Steven Bird | Trevor Cohn
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2015

pdf bib
Inducing bilingual lexicons from small quantities of sentence-aligned phonemic transcriptions
Oliver Adams | Graham Neubig | Trevor Cohn | Steven Bird
Proceedings of the 12th International Workshop on Spoken Language Translation: Papers

pdf bib
A Neural Network Model for Low-Resource Universal Dependency Parsing
Long Duong | Trevor Cohn | Steven Bird | Paul Cook
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Cross-lingual Transfer for Unsupervised Dependency Parsing Without Parallel Data
Long Duong | Trevor Cohn | Steven Bird | Paul Cook
Proceedings of the Nineteenth Conference on Computational Natural Language Learning

pdf bib
Low Resource Dependency Parsing: Cross-lingual Parameter Sharing in a Neural Network Parser
Long Duong | Trevor Cohn | Steven Bird | Paul Cook
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
Collective Document Classification with Implicit Inter-document Semantic Relationships
Clint Burford | Steven Bird | Timothy Baldwin
Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics

2014

pdf bib
Collecting Bilingual Audio in Remote Indigenous Communities
Steven Bird | Lauren Gawne | Katie Gelbart | Isaac McAlister
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
What Can We Get From 1000 Tokens? A Case Study of Multilingual POS Tagging For Resource-Poor Languages
Long Duong | Trevor Cohn | Karin Verspoor | Steven Bird | Paul Cook
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Aikuma: A Mobile App for Collaborative Language Documentation
Steven Bird | Florian R. Hanke | Oliver Adams | Haejoong Lee
Proceedings of the 2014 Workshop on the Use of Computational Methods in the Study of Endangered Languages

2013

pdf bib
Large-Scale Text Collection for Unwritten Languages
Florian R. Hanke | Steven Bird
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Increasing the Quality and Quantity of Source Language Data for Unsupervised Cross-Lingual POS Tagging
Long Duong | Paul Cook | Steven Bird | Pavel Pecina
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Simpler unsupervised POS tagging with bilingual projections
Long Duong | Paul Cook | Steven Bird | Pavel Pecina
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf bib
Machine Translation for Language Preservation
Steven Bird | David Chiang
Proceedings of COLING 2012: Posters

pdf bib
Fangorn: A System for Querying very large Treebanks
Sumukh Ghodke | Steven Bird
Proceedings of COLING 2012: Demonstration Papers

2011

pdf bib
Normalising Audio Transcriptions for Unwritten Languages
Adel Foda | Steven Bird
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
A Breadth-First Representation for Tree Matching in Large Scale Forest-Based Translation
Sumukh Ghodke | Steven Bird | Rui Zhang
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Collective Classification of Congressional Floor-Debate Transcripts
Clinton Burfoot | Steven Bird | Timothy Baldwin
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Towards a Data Model for the Universal Corpus
Steven Abney | Steven Bird
Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web

2010

pdf bib
Fast Query for Large Treebanks
Sumukh Ghodke | Steven Bird
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
The Human Language Project: Building a Universal Corpus of the World’s Languages
Steven Abney | Steven Bird
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

2009

pdf bib
Last Words: Natural Language Processing and Linguistic Fieldwork
Steven Bird
Computational Linguistics, Volume 35, Number 3, September 2009

2008

pdf bib
The ACL Anthology Reference Corpus: A Reference Dataset for Bibliographic Research in Computational Linguistics
Steven Bird | Robert Dale | Bonnie Dorr | Bryan Gibson | Mark Joseph | Min-Yen Kan | Dongwon Lee | Brett Powley | Dragomir Radev | Yee Fan Tan
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The ACL Anthology is a digital archive of conference and journal papers in natural language processing and computational linguistics. Its primary purpose is to serve as a reference repository of research results, but we believe that it can also be an object of study and a platform for research in its own right. We describe an enriched and standardized reference corpus derived from the ACL Anthology that can be used for research in scholarly document processing. This corpus, which we call the ACL Anthology Reference Corpus (ACL ARC), brings together the recent activities of a number of research groups around the world. Our goal is to make the corpus widely available, and to encourage other researchers to use it as a standard testbed for experiments in both bibliographic and bibliometric research.

pdf bib
Defining a Core Body of Knowledge for the Introductory Computational Linguistics Curriculum
Steven Bird
Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics

pdf bib
Multidisciplinary Instruction with the Natural Language Toolkit
Steven Bird | Ewan Klein | Edward Loper | Jason Baldridge
Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics

pdf bib
Toward a Global Infrastructure for the Sustainability of Language Resources
Gary Simons | Steven Bird
Proceedings of the 22nd Pacific Asia Conference on Language, Information and Computation

2007

pdf bib
Dynamic Path Prediction and Recommendation in a Museum Environment
Karl Grieser | Timothy Baldwin | Steven Bird
Proceedings of the Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2007).

2006

pdf bib
Reconsidering Language Identification for Written Language Resources
Baden Hughes | Timothy Baldwin | Steven Bird | Jeremy Nicholson | Andrew MacKinlay
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The task of identifying the language in which a given document (ranging from a sentence to thousands of pages) is written has been relatively well studied over several decades. Automated approachesto written language identification are used widely throughout research and industrial contexts, over both oral and written source materials. Despite this widespread acceptance, a review of previous research in written language identification reveals a number of questions which remain openand ripe for further investigation.

pdf bib
NLTK: The Natural Language Toolkit
Steven Bird
Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions

pdf bib
Analysis and Prediction of User Behaviour in a Museum Environment
Karl Grieser | Timothy Baldwin | Steven Bird
Proceedings of the Australasian Language Technology Workshop 2006

2005

pdf bib
Structuring Documents Efficiently
Robert Marshall | Steven Bird | Peter Stuckey
Proceedings of the Australasian Language Technology Workshop 2005

pdf bib
LPath+: A First-Order Complete Language for Linguistic Tree Query
Catherine Lai | Steven Bird
Proceedings of the 19th Pacific Asia Conference on Language, Information and Computation

2004

pdf bib
Securing Interpretability: The Case of Ega Language Documentation
Dafydd Gibbon | Catherine Bow | Steven Bird | Baden Hughes
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Functional Requirements for an Interlinear Text Editor
Baden Hughes | Catherine Bow | Steven Bird
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Management of Metadata in Linguistic Fieldwork: Experience from the ACLA Project
Baden Hughes | David Penton | Steven Bird | Catherine Bow | Gillian Wigglesworth | Patrick McConvell | Jane Simpson
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction
Brian MacWhinney | Steven Bird | Christopher Cieri | Craig Martell
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
NLTK: The Natural Language Toolkit
Steven Bird | Edward Loper
Proceedings of the ACL Interactive Poster and Demonstration Sessions

pdf bib
Representing and Rendering Linguistic Paradigms
David Penton | Steven Bird
Proceedings of the Australasian Language Technology Workshop 2004

pdf bib
Querying and Updating Treebanks: A Critical Survey and Requirements Analysis
Catherine Lai | Steven Bird
Proceedings of the Australasian Language Technology Workshop 2004

2003

pdf bib
Encoding and presenting interlinear text using XML technologies
Baden Hughes | Steven Bird | Catherine Bow
Proceedings of the Australasian Language Technology Workshop 2003

pdf bib
Grid-Enabling Natural Language Engineering By Stealth
Baden Hughes | Steven Bird
Proceedings of the HLT-NAACL 2003 Workshop on Software Engineering and Architecture of Language Technology Systems (SEALTS)

2002

pdf bib
The Open Language Archives Community
Steven Bird | Hans Uszkoreit | Gary Simons
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
Models and Tools for Collaborative Annotation
Xiaoyi Ma | Haejoong Lee | Steven Bird | Kazuaki Maeda
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
TableTrans, MultiTrans, InterTrans and TreeTrans: Diverse Tools Built on the Annotation Graph Toolkit
Steven Bird | Kazuaki Maeda | Xiaoyi Ma | Haejoong Lee | Beth Randall | Salim Zayat
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
Creating Annotation Tools with the Annotation Graph Toolkit
Kazauki Maeda | Steven Bird | Xiaoyi Ma | Haejoong Lee
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
An integrated framework for treebanks and multilayer annotations
Scott Cotton | Steven Bird
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
NLTK: The Natural Language Toolkit
Edward Loper | Steven Bird
Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics

2001

pdf bib
The Annotation Graph Toolkit: Software Components for Building Linguistic Annotation Tools
Kazuaki Maeda | Steven Bird | Xiaoyi Ma | Haejoong Lee
Proceedings of the First International Conference on Human Language Technology Research

pdf bib
The OLAC Metadata Set and Controlled Vocabularies
Steven Bird | Gary Simons
Proceedings of the ACL 2001 Workshop on Sharing Tools and Resources

pdf bib
Annotation Graphs and Servers and Multi-Modal Resources: Infrastructure for Interdisciplinary Education, Research and Development
Christopher Cieri | Steven Bird
Proceedings of the ACL 2001 Workshop on Sharing Tools and Resources

pdf bib
Annotation Tools Based on the Annotation Graph API
Steven Bird | Kazuaki Maeda | Xiaoyi Ma | Haejoong Lee
Proceedings of the ACL 2001 Workshop on Sharing Tools and Resources

2000

pdf bib
ATLAS: A Flexible and Extensible Architecture for Linguistic Annotation
Steven Bird | David Day | John Garofolo | John Henderson | Christophe Laprun | Mark Liberman
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
Transcribing with Annotation Graphs
Edouard Geoffrois | Claude Barras | Steven Bird | Zhibiao Wu
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
Towards a Query Language for Annotation Graphs
Steven Bird | Peter Buneman | Wang-Chiew Tan
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
Many Uses, Many Annotations for Large Speech Corpora: Switchboard and TDT as Case Studies
David Graff | Steven Bird
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

1999

pdf bib
Annotation Graphs as a Framework for Multidimensional Linguistic Data Analysis
Steven Bird | Mark Liberman
Towards Standards and Tools for Discourse Tagging

1997

pdf bib
A Lexical Database Tool for Quantitative Phonological Research
Steven Bird
Computational Phonology: Third Meeting of the ACL Special Interest Group in Computational Phonology

1994

pdf bib
One-Level Phonology: Autosegmental Representations and Rules as Finite Automata
Steven Bird | T. Mark Ellison
Computational Linguistics, Volume 20, Number 1, March 1994

pdf bib
Phonological Analysis in Typed Feature Systems
Steven Bird | Ewan Klein
Computational Linguistics, Volume 20, Number 3, September 1994

pdf bib
Automated Tone Transcription
Steven Bird
Computational Phonology

1992

pdf bib
Finite-State Phonology in HPSG
Steven Bird
COLING 1992 Volume 1: The 14th International Conference on Computational Linguistics

1991

pdf bib
A Logical Approach to Arabic Phonology
Steven Bird | Patrick Blackburn
Fifth Conference of the European Chapter of the Association for Computational Linguistics