Kenneth Church

Also published as: Ken Church, Kenneth W. Church, Kenneth Ward Church


2021

pdf bib
On Attention Redundancy: A Comprehensive Study
Yuchen Bian | Jiaji Huang | Xingyu Cai | Jiahong Yuan | Kenneth Church
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Multi-layer multi-head self-attention mechanism is widely applied in modern neural language models. Attention redundancy has been observed among attention heads but has not been deeply studied in the literature. Using BERT-base model as an example, this paper provides a comprehensive study on attention redundancy which is helpful for model interpretation and model compression. We analyze the attention redundancy with Five-Ws and How. (What) We define and focus the study on redundancy matrices generated from pre-trained and fine-tuned BERT-base model for GLUE datasets. (How) We use both token-based and sentence-based distance functions to measure the redundancy. (Where) Clear and similar redundancy patterns (cluster structure) are observed among attention heads. (When) Redundancy patterns are similar in both pre-training and fine-tuning phases. (Who) We discover that redundancy patterns are task-agnostic. Similar redundancy patterns even exist for randomly generated token sequences. (“Why”) We also evaluate influences of the pre-training dropout ratios on attention redundancy. Based on the phase-independent and task-agnostic attention redundancy patterns, we propose a simple zero-shot pruning method as a case study. Experiments on fine-tuning GLUE tasks verify its effectiveness. The comprehensive analyses on attention redundancy make model understanding and zero-shot model pruning promising.

pdf bib
Proceedings of the 1st Workshop on Benchmarking: Past, Present and Future
Kenneth Church | Mark Liberman | Valia Kordoni
Proceedings of the 1st Workshop on Benchmarking: Past, Present and Future

pdf bib
Benchmarking: Past, Present and Future
Kenneth Church | Mark Liberman | Valia Kordoni
Proceedings of the 1st Workshop on Benchmarking: Past, Present and Future

Where have we been, and where are we going? It is easier to talk about the past than the future. These days, benchmarks evolve more bottom up (such as papers with code). There used to be more top-down leadership from government (and industry, in the case of systems, with benchmarks such as SPEC). Going forward, there may be more top-down leadership from organizations like MLPerf and/or influencers like David Ferrucci, who was responsible for IBM’s success with Jeopardy, and has recently written a paper suggesting how the community should think about benchmarking for machine comprehension. Tasks such as reading comprehension become even more interesting as we move beyond English. Multilinguality introduces many challenges, and even more opportunities.

2020

pdf bib
Incremental Text-to-Speech Synthesis with Prefix-to-Prefix Framework
Mingbo Ma | Baigong Zheng | Kaibo Liu | Renjie Zheng | Hairong Liu | Kainan Peng | Kenneth Church | Liang Huang
Findings of the Association for Computational Linguistics: EMNLP 2020

Text-to-speech synthesis (TTS) has witnessed rapid progress in recent years, where neural methods became capable of producing audios with high naturalness. However, these efforts still suffer from two types of latencies: (a) the computational latency (synthesizing time), which grows linearly with the sentence length, and (b) the input latency in scenarios where the input text is incrementally available (such as in simultaneous translation, dialog generation, and assistive technologies). To reduce these latencies, we propose a neural incremental TTS approach using the prefix-to-prefix framework from simultaneous translation. We synthesize speech in an online fashion, playing a segment of audio while generating the next, resulting in an O(1) rather than O(n) latency. Experiments on English and Chinese TTS show that our approach achieves similar speech naturalness compared to full sentence TTS, but only with a constant (1-2 words) latency.

pdf bib
Fluent and Low-latency Simultaneous Speech-to-Speech Translation with Self-adaptive Training
Renjie Zheng | Mingbo Ma | Baigong Zheng | Kaibo Liu | Jiahong Yuan | Kenneth Church | Liang Huang
Findings of the Association for Computational Linguistics: EMNLP 2020

Simultaneous speech-to-speech translation is an extremely challenging but widely useful scenario that aims to generate target-language speech only a few seconds behind the source-language speech. In addition, we have to continuously translate a speech of multiple sentences, but all recent solutions merely focus on the single-sentence scenario. As a result, current approaches will accumulate more and more latencies in later sentences when the speaker talks faster and introduce unnatural pauses into translated speech when the speaker talks slower. To overcome these issues, we propose Self-Adaptive Translation which flexibly adjusts the length of translations to accommodate different source speech rates. At similar levels of translation quality (as measured by BLEU), our method generates more fluent target speech latency than the baseline, in both Zh<->En directions.

pdf bib
Improving Bilingual Lexicon Induction for Low Frequency Words
Jiaji Huang | Xingyu Cai | Kenneth Church
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

This paper designs a Monolingual Lexicon Induction task and observes that two factors accompany the degraded accuracy of bilingual lexicon induction for rare words. First, a diminishing margin between similarities in low frequency regime, and secondly, exacerbated hubness at low frequency. Based on the observation, we further propose two methods to address these two factors, respectively. The larger issue is hubness. Addressing that improves induction accuracy significantly, especially for low-frequency words.

2019

pdf bib
Hubless Nearest Neighbor Search for Bilingual Lexicon Induction
Jiaji Huang | Qiang Qiu | Kenneth Church
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Bilingual Lexicon Induction (BLI) is the task of translating words from corpora in two languages. Recent advances in BLI work by aligning the two word embedding spaces. Following that, a key step is to retrieve the nearest neighbor (NN) in the target space given the source word. However, a phenomenon called hubness often degrades the accuracy of NN. Hubness appears as some data points, called hubs, being extra-ordinarily close to many of the other data points. Reducing hubness is necessary for retrieval tasks. One successful example is Inverted SoFtmax (ISF), recently proposed to improve NN. This work proposes a new method, Hubless Nearest Neighbor (HNN), to mitigate hubness. HNN differs from NN by imposing an additional equal preference assumption. Moreover, the HNN formulation explains why ISF works as well as it does. Empirical results demonstrate that HNN outperforms NN, ISF and other state-of-the-art. For reproducibility and follow-ups, we have published all code.

2016

pdf bib
C2D2E2: Using Call Centers to Motivate the Use of Dialog and Diarization in Entity Extraction
Ken Church | Weizhong Zhu | Jason Pelecanos
Proceedings of the Workshop on Uphill Battles in Language Processing: Scaling Early Achievements to Robust Methods

2014

pdf bib
The Case for Empiricism (With and Without Statistics)
Kenneth Church
Proceedings of Frame Semantics in NLP: A Workshop in Honor of Chuck Fillmore (1929-2014)

2011

pdf bib
How Many Multiword Expressions do People Know?
Kenneth Church
Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World

pdf bib
A Fast Re-scoring Strategy to Capture Long-Distance Dependencies
Anoop Deoras | Tomáš Mikolov | Kenneth Church
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Proceedings of the IJCNLP 2011 System Demonstrations
Kenneth Church | Yunqing Xia
Proceedings of the IJCNLP 2011 System Demonstrations

pdf bib
Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation
Shane Bergsma | David Yarowsky | Kenneth Church
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
New Tools for Web-Scale N-grams
Dekang Lin | Kenneth Church | Heng Ji | Satoshi Sekine | David Yarowsky | Shane Bergsma | Kailash Patil | Emily Pitler | Rachel Lathbury | Vikram Rao | Kapil Dalwani | Sushant Narsale
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

While the web provides a fantastic linguistic resource, collecting and processing data at web-scale is beyond the reach of most academic laboratories. Previous research has relied on search engines to collect online information, but this is hopelessly inefficient for building large-scale linguistic resources, such as lists of named-entity types or clusters of distributionally similar words. An alternative to processing web-scale text directly is to use the information provided in an N-gram corpus. An N-gram corpus is an efficient compression of large amounts of text. An N-gram corpus states how often each sequence of words (up to length N) occurs. We propose tools for working with enhanced web-scale N-gram corpora that include richer levels of source annotation, such as part-of-speech tags. We describe a new set of search tools that make use of these tags, and collectively lower the barrier for lexical learning and ambiguity resolution at web-scale. They will allow novel sources of information to be applied to long-standing natural language challenges.

pdf bib
NLP on Spoken Documents Without ASR
Mark Dredze | Aren Jansen | Glen Coppersmith | Ken Church
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
Using Web-scale N-grams to Improve Base NP Parsing Performance
Emily Pitler | Shane Bergsma | Dekang Lin | Kenneth Church
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

2009

pdf bib
Using Word-Sense Disambiguation Methods to Classify Web Queries by Intent
Emily Pitler | Ken Church
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Repetition and Language Models and Comparable Corpora
Ken Church
Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora (BUCC)

2007

pdf bib
K-Best Suffix Arrays
Kenneth Church | Bo Thiesson | Robert Ragno
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers

pdf bib
A Sketch Algorithm for Estimating Two-Way and Multi-Way Associations
Ping Li | Kenneth W. Church
Computational Linguistics, Volume 33, Number 3, September 2007

pdf bib
Compressing Trigram Language Models With Golomb Coding
Kenneth Church | Ted Hart | Jianfeng Gao
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2005

pdf bib
Using Sketches to Estimate Associations
Ping Li | Kenneth W. Church
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
Last Words: Reviewing the Reviewers
Kenneth Church
Computational Linguistics, Volume 31, Number 4, December 2005

pdf bib
The Wild Thing
Ken Church | Bo Thiesson
Proceedings of the ACL Interactive Poster and Demonstration Sessions

2002

pdf bib
NLP Found Helpful (at least for one Text Categorization Task)
Carl Sable | Kathleen McKeown | Kenneth Church
Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002)

2001

pdf bib
Using Bins to Empirically Estimate Term Weights for Text Categorization
Carl Sable | Kenneth W. Church
Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing

pdf bib
Using Suffix Arrays to Compute Term Frequency and Document Frequency for All Substrings in a Corpus
Mikio Yamamoto | Kenneth W. Church
Computational Linguistics, Volume 27, Number 1, March 2001

2000

pdf bib
Empirical Estimates of Adaptation: The chance of Two Noriegas is closer to p/2 than p2
Kenneth W. Church
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics

pdf bib
Empirical Term Weighting and Expansion Frequency
Kyoji Umemura | Kenneth W. Church
2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora

1999

pdf bib
What’s Happened Since the First SIGDAT Meeting?
Kenneth Ward Church
1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora

1998

pdf bib
Using Suffix Arrays to Compute Term Frequency and Document Frequency for All Substrings in a Corpus
Mikio Yamamoto | Kenneth W. Church
Sixth Workshop on Very Large Corpora

1996

pdf bib
Panel: The limits of automation: optimists vs skeptics.
Eduard Hovy | Ken Church | Denis Gachot | Marge Leon | Alan Melby | Sergei Nirenburg | Yorick Wilks
Conference of the Association for Machine Translation in the Americas

1995

pdf bib
Inverse Document Frequency (IDF): A Measure of Deviations from Poisson
Kenneth Church | William Gale
Third Workshop on Very Large Corpora

1994

pdf bib
Termight: Identifying and Translating Technical Terminology
Ido Dagan | Ken Church
Fourth Conference on Applied Natural Language Processing

pdf bib
Fax: An Alternative to SGML
Kenneth W. Church | William A. Gale | Jonathan I. Helfman | David D. Lewis
COLING 1994 Volume 1: The 15th International Conference on Computational Linguistics

pdf bib
K-vec: A New Approach for Aligning Parallel Texts
Pascale Fung | Kenneth Ward Church
COLING 1994 Volume 2: The 15th International Conference on Computational Linguistics

pdf bib
Is MT Research Doing Any Good?
Kenneth Church | Bonnie Dorr | Eduard Hovy | Sergei Nirenburg | Bernard Scott | Virginia Teller
Proceedings of the First Conference of the Association for Machine Translation in the Americas

1993

pdf bib
Introduction to the Special Issue on Computational Linguistics Using Large Corpora
Kenneth W. Church | Robert L. Mercer
Computational Linguistics, Volume 19, Number 1, March 1993, Special Issue on Using Large Corpora: I

pdf bib
A Program for Aligning Sentences in Bilingual Corpora
William A. Gale | Kenneth W. Church
Computational Linguistics, Volume 19, Number 1, March 1993, Special Issue on Using Large Corpora: I

pdf bib
Robust Bilingual Word Alignment for Machine Aided Translation
Ido Dagan | Kenneth Church | Willian Gale
Very Large Corpora: Academic and Industrial Perspectives

pdf bib
Char_align: A Program for Aligning Parallel Texts at the Character Level
Kenneth Ward Church
31st Annual Meeting of the Association for Computational Linguistics

1992

pdf bib
Estimating Upper and Lower Bounds on the Performance of Word-Sense Disambiguation Programs
William Gale | Kenneth Ward Church | David Yarowsky
30th Annual Meeting of the Association for Computational Linguistics

pdf bib
One Sense Per Discourse
William A. Gale | Kenneth W. Church | David Yarowsky
Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, February 23-26, 1992

pdf bib
Using bilingual materials to develop word sense disambiguation methods
William A. Gale | Kenneth W. Church | David Yarowsky
Proceedings of the Fourth Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages

1991

pdf bib
A Program for Aligning Sentences in Bilingual Corpora
William A. Gale | Kenneth W. Church
29th Annual Meeting of the Association for Computational Linguistics

pdf bib
Identifying Word Correspondences in Parallel Texts
William A. Gale | Kenneth W. Church
Speech and Natural Language: Proceedings of a Workshop Held at Pacific Grove, California, February 19-22, 1991

pdf bib
Book Reviews: Theory and Practice in Corpus Linguistics
Kenneth Ward Church
Computational Linguistics, Volume 17, Number 1, March 1991

1990

pdf bib
Word Association Norms, Mutual Information, and Lexicography
Kenneth Ward Church | Patrick Hanks
Computational Linguistics, Volume 16, Number 1, March 1990

pdf bib
A Spelling Correction Program Based on a Noisy Channel Model
Mark D. Kemighan | Kenneth W. Church | William A. Gale
COLING 1990 Volume 2: Papers presented to the 13th International Conference on Computational Linguistics

pdf bib
Poor Estimates of Context are Worse than None
William A. Gale | Kenneth W. Church
Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24-27,1990

1989

pdf bib
Parsing, Word Associations and Typical Predicate-Argument Relations
Kenneth Church | William Gale | Patrick Hanks | Donald Hindle
Speech and Natural Language: Proceedings of a Workshop Held at Cape Cod, Massachusetts, October 15-18, 1989

pdf bib
Enhanced Good-Turing and Cat-Cal: Two New Methods for Estimating Probabilities of English Bigrams (abbreviated version)
Kenneth W. Church | William A. Gale
Speech and Natural Language: Proceedings of a Workshop Held at Cape Cod, Massachusetts, October 15-18, 1989

pdf bib
Session 11 Natural Language III
Kenneth Ward Church
Speech and Natural Language: Proceedings of a Workshop Held at Cape Cod, Massachusetts, October 15-18, 1989

pdf bib
Word Association Norms, Mutual Information, and Lexicography
Kenneth Ward Church | Patrick Hanks
27th Annual Meeting of the Association for Computational Linguistics

pdf bib
Parsing, Word Associations and Typical Predicate-Argument Relations
Kenneth Church | William Gale | Patrick Hanks | Donald Hindle
Proceedings of the First International Workshop on Parsing Technologies

There are a number of collocational constraints in natural languages that ought to play a more important role in natural language parsers. Thus, for example, it is hard for most parsers to take advantage of the fact that wine is typically drunk, produced, and sold, but (probably) not pruned. So too, it is hard for a parser to know which verbs go with which prepositions (e.g., set up) and which nouns fit together to form compound noun phrases (e.g., computer programmer). This paper will attempt to show that many of these types of concerns can be addressed with syntactic methods (symbol pushing), and need not require explicit semantic interpretation. We have found that it is possible to identify many of these interesting co-occurrence relations by computing simple summary statistics over millions of words of text. This paper will summarize a number of experiments carried out by various subsets of the authors over the last few years. The term collocation will be used quite broadly to include constraints on SVO (subject verb object) triples, phrasal verbs, compound noun phrases, and psychoiinguistic notions of word association (e.g., doctor/nurse).

1988

pdf bib
Complexity, Two-Level Morphology and Finnish
Kimmo Koskenniemi | Kenneth Ward Church
Coling Budapest 1988 Volume 1: International Conference on Computational Linguistics

pdf bib
A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text
Kenneth Ward Church
Second Conference on Applied Natural Language Processing

1986

pdf bib
Morphological Decomposition and Stress Assignment for Speech Synthesis
Kenneth Church
24th Annual Meeting of the Association for Computational Linguistics

1985

pdf bib
Stress Assignment in Letter to Sound Rules for Speech Synthesis
Kenneth Church
23rd Annual Meeting of the Association for Computational Linguistics

1983

pdf bib
A Finite-State Parser for Use in Speech Recognition
Kenneth W. Church
21st Annual Meeting of the Association for Computational Linguistics

1982

pdf bib
Coping with Syntactic Ambiguity or How to Put the Block in the Box on the Table
Kenneth Church | Ramesh Patil
American Journal of Computational Linguistics, Volume 8, Number 3-4, July-December 1982

1980

pdf bib
On Parsing Strategies and Closure
Kenneth Church
18th Annual Meeting of the Association for Computational Linguistics