Francis Bond - ACL Anthology

Francis Bond

2026

More Aligned, Less Diverse? Analyzing the Grammar and Lexicon of Two Generations of LLMs
Adrián Gude | Roi Santos-Rios | Francis Bond | Dan Flickinger | Carlos Gómez-Rodríguez | Olga Zamaraeva
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

This study contributes to a growing line of research in comparing LLM-generated texts with human-authored text, in this case, English news text. We focus in particular on the evaluation of syntactic properties through formal grammar frameworks. Our analysis compares two generations of LLMs in the context of two human-authored English news datasets from two different years. Employing the Head-Driven Phrase Structure Grammar (HPSG) formalism, we investigate the distributions of syntactic structures and lexical types of AI-generated texts and contrast them with the corresponding distributions in the human-authored New Your Times (NYT) articles. We use diversity metrics from ecology and information theory to quantify variation in grammatical constructions and lexical types. Our results show that, while English news text has changed little in the given time frame, newer, instruction-tuned LLMs display reduced syntactic and, especially, lexical diversity compared to older, non-instruction-tuned models. These findings point to future work in studying effects of instruction tuning, which, while enhancing coherence and adherence to prompts, may narrow the expressive range of model output.

2025

Can you hear me now? Towards talking Wordnets: A Cantonese Case Study
Joanna Ut-Seong Sio | Luis Morgado Da Costa | Francis Bond | Kamila Liedermannova
Proceedings of the 13th Global Wordnet Conference

Adding Audio to Wordnets
Francis Bond
Proceedings of the 13th Global Wordnet Conference

Metonymy is more multilingual than metaphor: Analysing tropes using ChainNet and the Open Multilingual Wordnet
Francis Bond | Rowan Hall Maudslay
Proceedings of the 13th Global Wordnet Conference

Proceedings of the 13th Global Wordnet Conference
Chiara Zanchi | Luca Brigada Villa | Erica Biagetti | Alexandre Rademaker | Francis Bond | German Rigau
Proceedings of the 13th Global Wordnet Conference

Comparing LLM-generated and human-authored news text using formal syntactic theory
Olga Zamaraeva | Dan Flickinger | Francis Bond | Carlos Gómez-Rodríguez
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

This study provides the first comprehensive comparison of New York Times-style text generated by six large language models against real, human-authored NYT writing. The comparison is based on a formal syntactic theory. We use Head-driven Phrase Structure Grammar (HPSG) to analyze the grammatical structure of the texts. We then investigate and illustrate the differences in the distributions of HPSG grammar types, revealing systematic distinctions between human and LLM-generated writing. These findings contribute to a deeper understanding of the syntactic behavior of LLMs as well as humans, within the NYT genre.

2024

ChainNet: Structured Metaphor and Metonymy in WordNet
Rowan Hall Maudslay | Simone Teufel | Francis Bond | James Pustejovsky
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The senses of a word exhibit rich internal structure. In a typical lexicon, this structure is overlooked: A word’s senses are encoded as a list, without inter-sense relations. We present ChainNet, a lexical resource which for the first time explicitly identifies these structures, by expressing how senses in the Open English Wordnet are derived from one another. In ChainNet, every nominal sense of a word is either connected to another sense by metaphor or metonymy, or is disconnected (in the case of homonymy). Because WordNet senses are linked to resources which capture information about their meaning, ChainNet represents the first dataset of grounded metaphor and metonymy.

This Word Mean What: Constructing a Singlish Dictionary with ChatGPT
Siew Yeng Chow | Chang-Uk Shin | Francis Bond
Proceedings of the 2nd Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia (EURALI) @ LREC-COLING 2024

Despite the magnitude of recent progress in natural language processing and multilingual language modeling research, the vast majority of NLP research is focused on English and other major languages. This is because recent NLP research is mainly data-driven, and there is more data for resource-rich languages. In particular, Large Language Models (LLM) make use of large unlabeled datasets, a resource that many languages do not have. In this project, we built a new, open-sourced dictionary of Singlish, a contact variety that contains features from English and other local languages and is syntactically, phonologically and lexically distinct from Standard English (Tan, 2010). First, a list of Singlish words was extracted from various online sources. Then using an open Chat-GPT LLM API, the description, including the defintion, part of speech, pronunciation and examples was produced. These were then refined through post processing carried out by a native speaker. The dictionary currently has 1,783 entries and is published under the CC-BY-SA license. The project was carried out with the intention of facilitating future Singlish research and other applications as the accumulation and management of language resources will be of great help in promoting research on the language in the future.

2023

Linking SIL Semantic Domains to Wordnet and Expanding the Abui Wordnet through Rapid Word Collection Methodology
Luis Morgado da Costa | František Kratochvíl | George Saad | Benidiktus Delpada | Daniel Simon Lanma | Francis Bond | Natálie Wolfová | A.L. Blake
Proceedings of the 12th Global Wordnet Conference

In this paper we describe a new methodology to expand the Abui Wordnet through data collected using the Rapid Word Collection (RWC) method – based on SIL’s Semantic Domains. Using a multilingual sense-intersection algorithm, we created a ranked list of concept suggestions for each domain, and then used the ranked list as a filter to link the Abui RWC data to wordnet. This used translations from both SIL’s Semantic Domain’s structure and example words, both available through SIL’s Fieldworks software and the RWC project. We release both the new mapping of the SIL Semantic Domains to wordnet and an expansion of the Abui Wordnet.

The Japanese Wordnet 2.0
Francis Bond | Takayuki Kuribayashi
Proceedings of the 12th Global Wordnet Conference

This paper describes a new release of the Japanese wordnet. It uses the new global wordnet formats (McCrae et al., 2021) to incorporate a range of new information: orthographic variants (including hiragana, katakana and Latin representations) first described in Kuroda et al. (2011), classifiers, pronouns and exclamatives (Morgado da Costa and Bond, 2016) and many new senses, motivated both from corpus annotation and linking to the TUFs basic vocabulary (Bond et al., 2020). The wordnet has been moved to github and is available at https://bond-lab.github.io/wnja/.

Documenting the Open Multilingual Wordnet
Francis Bond | Michael Wayne Goodman | Ewa Rudnicka | Luis Morgado da Costa | Alexandre Rademaker | John P. McCrae
Proceedings of the 12th Global Wordnet Conference

In this project note we describe our work to make better documentation for the Open Multilingual Wordnet (OMW), a platform integrating many open wordnets. This includes the documentation of the OMW website itself as well as of semantic relations used by the component wordnets. Some of this documentation work was done with the support of the Google Season of Docs. The OMW project page, which links both to the actual OMW server and the documentation has been moved to a new location: https://omwn.org.

Proceedings of the 12th Global Wordnet Conference
German Rigau | Francis Bond | Alexandre Rademaker
Proceedings of the 12th Global Wordnet Conference

2022

Sense and Sentiment
Francis Bond | Merrick Choo
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In this paper we examine existing sentiment lexicons and sense-based sentiment-tagged corpora to find out how sense and concept-based semantic relations effect sentiment scores (for polarity and valence). We show that some relations are good predictors of sentiment of related words: antonyms have similar valence and opposite polarity, synonyms similar valence and polarity, as do many derivational relations. We use this knowledge and existing resources to build a sentiment annotated wordnet of English, and show how it can be used to produce sentiment lexicons for other languages using the Open Multilingual Wordnet.

Singlish Where Got Rules One? Constructing a Computational Grammar for Singlish
Siew Yeng Chow | Francis Bond
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Singlish is a variety of English spoken in Singapore. In this paper, we share some of its grammar features and how they are implemented in the construction of a computational grammar of Singlish as a branch of English grammar. New rules were created and existing ones from standard English grammar of the English Resource Grammar (ERG) were changed in this branch to cater to how Singlish works. In addition, Singlish lexicon was added into the grammar together with some new lexical types. We used Head-driven Phrase Structure Grammar (HPSG) as the framework for this project of a creating a working computational grammar. As part of building the language resource, we also collected and formatted some data from the internet as part of a test suite for Singlish. Finally, the computational grammar was tested against a set of gold standard trees and compared with the standard English grammar to find out how well the grammar fares in analysing Singlish.

The Tembusu Treebank: An English Learner Treebank
Luís Morgado da Costa | Francis Bond | Roger V. P. Winder
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper reports on the creation and development of the Tembusu Learner Treebank — an open treebank created from the NTU Corpus of Learner English, unique for incorporating mal-rules in the annotation of ungrammatical sentences. It describes the motivation and development of the treebank, as well as its exploitation to build a new parse-ranking model for the English Resource Grammar, designed to help improve the parse selection of ungrammatical sentences and diagnose these sentences through mal-rules. The corpus contains 25,000 sentences, of which 4,900 are treebanked. The paper concludes with an evaluation experiment that shows the usefulness of this new treebank in the tasks of grammatical error detection and diagnosis.

2021

Taboo Wordnet
Francis Bond | Merrick Yeu Herng Choo
Proceedings of the 11th Global Wordnet Conference

This paper describes the development of an online lexical resource to help detection systems regulate and curb the use of offensive words online. With the growing prevalence of social media platforms, many conversations are now conducted on- line. The increase of online conversations for leisure, work and socializing has led to an increase in harassment. In particular, we create a specialized sense-based vocabulary of Japanese offensive words for the Open Multilingual Wordnet. This vocabulary expands on an existing list of Japanese offen- sive words and provides categorization and proper linking to synsets within the multilingual wordnet. This paper then discusses the evaluation of the vocabulary as a resource for representing and classifying offensive words and as a possible resource for offensive word use detection in social media.

Testing agreement between lexicographers: A case of homonymy and polysemy
Marek Maziarz | Francis Bond | Ewa Rudnicka
Proceedings of the 11th Global Wordnet Conference

In this paper we compare Oxford Lexico and Merriam Webster dictionaries with Princeton WordNet with respect to the description of semantic (dis)similarity between polysemous and homonymous senses that could be inferred from them. WordNet lacks any explicit description of polysemy or homonymy, but as a network of linked senses it may be used to compute semantic distances between word senses. To compare WordNet with the dictionaries, we transformed sample entry microstructures of the latter into graphs and cross-linked them with the equivalent senses of the former. We found that dictionaries are in high agreement with each other, if one considers polysemy and homonymy altogether, and in moderate concordance, if one focuses merely on polysemy descriptions. Measuring the shortest path lengths on WordNet gave results comparable to those on the dictionaries in predicting semantic dissimilarity between polysemous senses, but was less felicitous while recognising homonymy.

Teaching Through Tagging — Interactive Lexical Semantics
Francis Bond | Andrew Devadason | Melissa Rui Lin Teo | Luís Morgado da Costa
Proceedings of the 11th Global Wordnet Conference

In this paper we discuss an ongoing effort to enrich students’ learning by involving them in sense tagging. The main goal is to lead students to discover how we can represent meaning and where the limits of our current theories lie. A subsidiary goal is to create sense tagged corpora and an accompanying linked lexicon (in our case wordnets). We present the results of tagging several texts and suggest some ways in which the tagging process could be improved. Two authors of this paper present their own experience as students. Overall, students reported that they found the tagging an enriching experience. The annotated corpora and changes to the wordnet are made available through the NTU multilingual corpus and associated wordnets (NTU-MC).

OdeNet: Compiling a GermanWordNet from other Resources
Melanie Siegel | Francis Bond
Proceedings of the 11th Global Wordnet Conference

The Princeton WordNet for the English language has been used worldwide in NLP projects for many years. With the OMW initiative, wordnets for different languages of the world are being linked via identifiers. The parallel development and linking allows new multilingual application perspectives. The development of a wordnet for the German language is also in this context. To save development time, existing resources were combined and recompiled. The result was then evaluated and improved. In a relatively short time a resource was created that can be used in projects and continuously improved and extended.

Intrinsically Interlingual: The Wn Python Library for Wordnets
Michael Wayne Goodman | Francis Bond
Proceedings of the 11th Global Wordnet Conference

This paper introduces Wn, a new Python library for working with wordnets. Unlike previous libraries, Wn is built from the beginning to accommodate multiple wordnets — for multiple languages or multiple versions of the same wordnet — while retaining the ability to query and traverse them independently. It is also able to download and incorporate wordnets published online. These features are made possible through Wn’s adoption of standard formats and methods for interoperability, namely the WN-LMF schema (Vossen et al., 2013; Bond et al., 2020) and the Collaborative Interlingual Index (Bond et al., 2016). Wn is open-source, easily available, and well-documented.

The GlobalWordNet Formats: Updates for 2020
John P. McCrae | Michael Wayne Goodman | Francis Bond | Alexandre Rademaker | Ewa Rudnicka | Luís Morgado Da Costa
Proceedings of the 11th Global Wordnet Conference

The Global Wordnet Formats have been introduced to enable wordnets to have a common representation that can be integrated through the Global WordNet Grid. As a result of their adoption, a number of shortcomings of the format were identified, and in this paper we describe the extensions to the formats that address these issues. These include: ordering of senses, dependencies between wordnets, pronunciation, syntactic modelling, relations, sense keys, metadata and RDF support. Furthermore, we provide some perspectives on how these changes help in the integration of wordnets.

2020

English WordNet 2020: Improving and Extending a WordNet for English using an Open-Source Methodology
John P. McCrae | Alexandre Rademaker | Ewa Rudnicka | Francis Bond
Proceedings of the LREC 2020 Workshop on Multimodal Wordnets (MMW2020)

WordNet, while one of the most widely used resources for NLP, has not been updated for a long time, and as such a new project English WordNet has arisen to continue the development of the model under an open-source paradigm. In this paper, we detail the second release of this resource entitled “English WordNet 2020”. The work has focused firstly, on the introduction of new synsets and senses and developing guidelines for this and secondly, on the integration of contributions from other projects. We present the changes in this edition, which total over 15,000 changes over the previous release.

Automated Writing Support Using Deep Linguistic Parsers
Luís Morgado da Costa | Roger V P Winder | Shu Yun Li | Benedict Christopher Lin Tzer Liang | Joseph Mackinnon | Francis Bond
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper introduces a new web system that integrates English Grammatical Error Detection (GED) and course-specific stylistic guidelines to automatically review and provide feedback on student assignments. The system is being developed as a pedagogical tool for English Scientific Writing. It uses both general NLP methods and high precision parsers to check student assignments before they are submitted for grading. Instead of generalized error detection, our system aims to identify, with high precision, specific classes of problems that are known to be common among engineering students. Rather than correct the errors, our system generates constructive feedback to help students identify and correct them on their own. A preliminary evaluation of the system’s in-class performance has shown measurable improvements in the quality of student assignments.

Some Issues with Building a Multilingual Wordnet
Francis Bond | Luis Morgado da Costa | Michael Wayne Goodman | John P. McCrae | Ahti Lohk
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this paper we discuss the experience of bringing together over 40 different wordnets. We introduce some extensions to the GWA wordnet LMF format proposed in Vossen et al. (2016) and look at how this new information can be displayed. Notable extensions include: confidence, corpus frequency, orthographic variants, lexicalized and non-lexicalized synsets and lemmas, new parts of speech, and more. Many of these extensions already exist in multiple wordnets – the challenge was to find a compatible representation. To this end, we introduce a new version of the Open Multilingual Wordnet (Bond and Foster, 2013), that integrates a new set of tools that tests the extensions introduced by this new format, while also ensuring the integrity of the Collaborative Interlingual Index (CILI: Bond et al., 2016), avoiding the same new concept to be introduced through multiple projects.

Linking the TUFS Basic Vocabulary to the Open Multilingual Wordnet
Francis Bond | Hiroki Nomoto | Luís Morgado da Costa | Arthur Bond
Proceedings of the Twelfth Language Resources and Evaluation Conference

We describe the linking of the TUFS Basic Vocabulary Modules, created for online language learning, with the Open Multilingual Wordnet. The TUFS modules have roughly 500 lexical entries in 30 languages, each with the lemma, a link across the languages, an example sentence, usage notes and sound files. The Open Multilingual Wordnet has 34 languages (11 shared with TUFS) organized into synsets linked by semantic relations, with examples and definitions for some languages. The links can be used to (i) evaluate existing wordnets, (ii) add data to these wordnets and (iii) create new open wordnets for Khmer, Korean, Lao, Mongolian, Russian, Tagalog, Urdua nd Vietnamese

2019

Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)
Agata Savary | Carla Parra Escartín | Francis Bond | Jelena Mitrović | Verginica Barbu Mititelu
Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)

Commonsense inference in human-robot communication
Aliaksandr Huminski | Yan Bin Ng | Kenneth Kwok | Francis Bond
Proceedings of the First Workshop on Commonsense Inference in Natural Language Processing

Natural language communication between machines and humans are still constrained. The article addresses a gap in natural language understanding about actions, specifically that of understanding commands. We propose a new method for commonsense inference (grounding) of high-level natural language commands into specific action commands for further execution by a robotic system. The method allows to build a knowledge base that consists of a large set of commonsense inferences. The preliminary results have been presented.

New Polysemy Structures in Wordnets Induced by Vertical Polysemy
Ahti Lohk | Heili Orav | Kadri Vare | Francis Bond | Rasmus Vaik
Proceedings of the 10th Global Wordnet Conference

This paper aims to study auto-hyponymy and auto-troponymy relations (or vertical polysemy) in 11 wordnets uploaded into the new Open Multilingual Wordnet (OMW) webpage. We investigate how vertical polysemy forms polysemy structures (or sense clusters) in semantic hierarchies of the wordnets. Our main results and discoveries are new polysemy structures that have not previously been associated with vertical polysemy, along with some inconsistencies of semantic relations analysis in the studied wordnets, which should not be there. In the case study, we turn attention to polysemy structures in the Estonian Wordnet (version 2.2.0), analyzing them and giving the lexicographers comments. In addition, we describe the detection algorithm of polysemy structures and an overview of the state of polysemy structures in 11 wordnets.

GeoNames Wordnet (geown): extracting wordnets from GeoNames
Francis Bond | Arthur Bond
Proceedings of the 10th Global Wordnet Conference

This paper introduces a new multilingual lexicon of geographical place names. The names are based on (and linked to) the GeoNames collection. Each location is treated as a new synset, which is linked by instance_hypernym to a small set of supertypes. These supertypes are linked to the collaborative interlingual index, based on mappings from GeoDomainWordnet. If a location is already in the interlingual index, then it is also linked to the entry, using mappings from the Geo-Wordnet. Finally, if GeoNames places the location in a larger location, this is linked using the mero_location link. Wordnets can be built for any language in GeoNames, we give results for those wordnets in the Open Multilingual Wordnet. We discuss how it is mapped and the characteristics of the extracted wordnets.

A Comparison of Sense-level Sentiment Scores
Francis Bond | Arkadiusz Janz | Maciej Piasecki
Proceedings of the 10th Global Wordnet Conference

In this paper, we compare a variety of sense-tagged sentiment resources, including SentiWordNet, ML-Senticon, plWordNet emo and the NTU Multilingual Corpus. The goal is to investigate the quality of the resources and see how well the sentiment polarity annotation maps across languages.

Testing Zipf’s meaning-frequency law with wordnets as sense inventories
Francis Bond | Arkadiusz Janz | Marek Maziarz | Ewa Rudnicka
Proceedings of the 10th Global Wordnet Conference

According to George K. Zipf, more frequent words have more senses. We have tested this law using corpora and wordnets of English, Spanish, Portuguese, French, Polish, Japanese, Indonesian and Chinese. We have proved that the law works pretty well for all of these languages if we take - as Zipf did - mean values of meaning count and averaged ranks. On the other hand, the law disastrously fails in predicting the number of senses for a single lemma. We have also provided the evidence that slope coefficients of Zipfian log-log linear model may vary from language to language.

English WordNet 2019 – An Open-Source WordNet for English
John P. McCrae | Alexandre Rademaker | Francis Bond | Ewa Rudnicka | Christiane Fellbaum
Proceedings of the 10th Global Wordnet Conference

We describe the release of a new wordnet for English based on the Princeton WordNet, but now developed under an open-source model. In particular, this version of WordNet, which we call English WordNet 2019, which has been developed by multiple people around the world through GitHub, fixes many errors in previous wordnets for English. We give some details of the changes that have been made in this version and give some perspectives about likely future changes that will be made as this project continues to evolve.

2018

Toward An Epic Epigraph Graph
Francis Bond | Graham Matthews
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Toward Constructing the National Cancer Institute Thesaurus Derived WordNet (ncitWN)
Amanda Hicks | Selja Seppälä | Francis Bond
Proceedings of the 9th Global Wordnet Conference

We describe preliminary work in the creation of the first specialized vocabulary to be integrated into the Open Multilingual Wordnet (OMW). The NCIt Derived WordNet (ncitWN) is based on the National Cancer Institute Thesaurus (NCIt), a controlled biomedical terminology that includes formal class restrictions and English definitions developed by groups of clinicians and terminologists. The ncitWN is created by converting the NCIt to the WordNet Lexical Markup Framework and adding semantic types. We report the development of a prototype ncitWN and first steps towards integrating it into the OMW.

Putting Figures on Influences on Moroccan Darija from Arabic, French and Spanish using the WordNet
Khalil Mrini | Francis Bond
Proceedings of the 9th Global Wordnet Conference

Moroccan Darija is a variant of Arabic with many influences. Using the Open Multilingual WordNet (OMW), we compare the lemmas in the Moroccan Darija Wordnet (MDW) with the standard Arabic, French and Spanish ones. We then compared the lemmas in each synset with their translation equivalents. Transliteration is used to bridge alphabet differences and match lemmas in the closest phonological way. The results put figures on the similarity Moroccan Darija has with Arabic, French and Spanish: respectively 42.0%, 2.8% and 2.2%.

Enchancing the Collaborative Interlingual Index for Digital Humanities: Cross-linguistic Analysis in the Domain of Theology
Laura Slaughter | Wenjie Wang | Luis Morgado Da Costa | Francis Bond
Proceedings of the 9th Global Wordnet Conference

We aim to support digital humanities work related to the study of sacred texts. To do this, we propose to build a cross-lingual wordnet within the do-main of theology. We target the Collaborative Interlingual Index (CILI) directly instead of each individual wordnet. The paper presents background for this proposal: (1) an overview of concepts relevant to theology and (2) a summary of the domain-associated issues observed in the Princeton WordNet (PWN). We have found that definitions for concepts in this domain can be too restrictive, inconsistent, and unclear. Necessary synsets are missing, with the PWN being skewed towards Christianity. We argue that tackling problems in a single domain is a better method for improving CILI. By focusing on a single topic rather than a single language, this will result in the proper construction of definitions, romanization/translation of lemmas, and also improvements in use of/creation of a cross-lingual domain hierarchy.

Automatic Identification of Basic-Level Categories
Chad Mills | Francis Bond | Gina-Anne Levow
Proceedings of the 9th Global Wordnet Conference

Basic-level categories have been shown to be both psychologically significant and useful in a wide range of practical applications. We build a rule-based system to identify basic-level categories in WordNet, achieving 77% accuracy on a test set derived from prior psychological experiments. With additional annotations we found our system also has low precision, in part due to the existence of many categories that do not fit into the three classes (superordinate, basic-level, and subordinate) relied on in basic-level category research.

Multilingual Wordnet sense Ranking using nearest context
E Umamaheswari Vasanthakumar | Francis Bond
Proceedings of the 9th Global Wordnet Conference

In this paper, we combine methods to estimate sense rankings from raw text with recent work on word embeddings to provide sense ranking estimates for the entries in the Open Multilingual WordNet (OMW). The existing Word2Vec pre-trained models from Polygot2 are only built for single word entries, we, therefore, re-train them with multiword expressions from the wordnets, so that multiword expressions can also be ranked. Thus this trained model gives embeddings for both single words and multiwords. The resulting lexicon gives a WSD baseline for five languages. The results are evaluated for Semcor sense corpora for 5 languages using Word2Vec and Glove models. The Glove model achieves an average accuracy of 0.47 and Word2Vec achieves 0.31 for languages such as English, Italian, Indonesian, Chinese and Japanese. The experimentation on OMW sense ranking proves that the rank correlation is generally similar to the human ranking. Hence distributional semantics can aid in Wordnet Sense Ranking.

Lexical Perspective on Wordnet to Wordnet Mapping
Ewa Rudnicka | Francis Bond | Łukasz Grabowski | Maciej Piasecki | Tadeusz Piotrowski
Proceedings of the 9th Global Wordnet Conference

The paper presents a feature-based model of equivalence targeted at (manual) sense linking between Princeton WordNet and plWordNet. The model incorporates insights from lexicographic and translation theories on bilingual equivalence and draws on the results of earlier synset-level mapping of nouns between Princeton WordNet and plWordNet. It takes into account all basic aspects of language such as form, meaning and function and supplements them with (parallel) corpus frequency and translatability. Three types of equivalence are distinguished, namely strong, regular and weak depending on the conformity with the proposed features. The presented solutions are language-neutral and they can be easily applied to language pairs other than Polish and English. Sense-level mapping is a more fine-grained mapping than the existing synset mappings and is thus of great potential to human and machine translation.

The Company They Keep: Extracting Japanese Neologisms Using Language Patterns
James Breen | Timothy Baldwin | Francis Bond
Proceedings of the 9th Global Wordnet Conference

We describe an investigation into the identification and extraction of unrecorded potential lexical items in Japanese text by detecting text passages containing selected language patterns typically associated with such items. We identified a set of suitable patterns, then tested them with two large collections of text drawn from the WWW and Twitter. Samples of the extracted items were evaluated, and it was demonstrated that the approach has considerable potential for identifying terms for later lexicographic analysis.

Proceedings of the 9th Global Wordnet Conference
Francis Bond | Piek Vossen | Christiane Fellbaum
Proceedings of the 9th Global Wordnet Conference

2017

NTUCLE: Developing a Corpus of Learner English to Provide Writing Support for Engineering Students
Roger Vivek Placidus Winder | Joseph MacKinnon | Shu Yun Li | Benedict Christopher Tzer Liang Lin | Carmel Lee Hah Heah | Luís Morgado da Costa | Takayuki Kuribayashi | Francis Bond
Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2017)

This paper describes the creation of a new annotated learner corpus. The aim is to use this corpus to develop an automated system for corrective feedback on students’ writing. With this system, students will be able to receive timely feedback on language errors before they submit their assignments for grading. A corpus of assignments submitted by first year engineering students was compiled, and a new error tag set for the NTU Corpus of Learner English (NTUCLE) was developed based on that of the NUS Corpus of Learner English (NUCLE), as well as marking rubrics used at NTU. After a description of the corpus, error tag set and annotation process, the paper presents the results of the annotation exercise as well as follow up actions. The final error tag set, which is significantly larger than that for the NUCLE error categories, is then presented before a brief conclusion summarising our experience and future plans.

2016

Syntactic Well-Formedness Diagnosis and Error-Based Coaching in Computer Assisted Language Learning using Machine Translation
Luis Morgado da Costa | Francis Bond | Xiaoling He
Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA2016)

We present a novel approach to Computer Assisted Language Learning (CALL), using deep syntactic parsers and semantic based machine translation (MT) in diagnosing and providing explicit feedback on language learners’ errors. We are currently developing a proof of concept system showing how semantic-based machine translation can, in conjunction with robust computational grammars, be used to interact with students, better understand their language errors, and help students correct their grammar through a series of useful feedback messages and guided language drills. Ultimately, we aim to prove the viability of a new integrated rule-based MT approach to disambiguate students’ intended meaning in a CALL system. This is a necessary step to provide accurate coaching on how to correct ungrammatical input, and it will allow us to overcome a current bottleneck in the field — an exponential burst of ambiguity caused by ambiguous lexical items (Flickinger, 2010). From the users’ interaction with the system, we will also produce a richly annotated Learner Corpus, annotated automatically with both syntactic and semantic information.

USAAR at SemEval-2016 Task 13: Hyponym Endocentricity
Liling Tan | Francis Bond | Josef van Genabith
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

LexSemTm: A Semantic Dataset Based on All-words Unsupervised Sense Distribution Learning
Andrew Bennett | Timothy Baldwin | Jey Han Lau | Diana McCarthy | Francis Bond
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Wow! What a Useful Extension! Introducing Non-Referential Concepts to Wordnet
Luis Morgado Da Costa | Francis Bond
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper we present the ongoing efforts to expand the depth and breath of the Open Multilingual Wordnet coverage by introducing two new classes of non-referential concepts to wordnet hierarchies: interjections and numeral classifiers. The lexical semantic hierarchy pioneered by Princeton Wordnet has traditionally restricted its coverage to referential and contentful classes of words: such as nouns, verbs, adjectives and adverbs. Previous efforts have been employed to enrich wordnet resources including, for example, the inclusion of pronouns, determiners and quantifiers within their hierarchies. Following similar efforts, and motivated by the ongoing semantic annotation of the NTU-Multilingual Corpus, we decided that the four traditional classes of words present in wordnets were too restrictive. Though non-referential, interjections and classifiers possess interesting semantics features that can be well captured by lexical resources like wordnets. In this paper, we will further motivate our decision to include non-referential concepts in wordnets and give an account of the current state of this expansion.

The Open Linguistics Working Group: Developing the Linguistic Linked Open Data Cloud
John P. McCrae | Christian Chiarcos | Francis Bond | Philipp Cimiano | Thierry Declerck | Gerard de Melo | Jorge Gracia | Sebastian Hellmann | Bettina Klimek | Steven Moran | Petya Osenova | Antonio Pareja-Lora | Jonathan Pool
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The Open Linguistics Working Group (OWLG) brings together researchers from various fields of linguistics, natural language processing, and information technology to present and discuss principles, case studies, and best practices for representing, publishing and linking linguistic data collections. A major outcome of our work is the Linguistic Linked Open Data (LLOD) cloud, an LOD (sub-)cloud of linguistic resources, which covers various linguistic databases, lexicons, corpora, terminologies, and metadata repositories. We present and summarize five years of progress on the development of the cloud and of advancements in open data in linguistics, and we describe recent community activities. The paper aims to serve as a guideline to orient and involve researchers with the community and/or Linguistic Linked Open Data.

CILI: the Collaborative Interlingual Index
Francis Bond | Piek Vossen | John P. McCrae | Christiane Fellbaum
Proceedings of the 8th Global WordNet Conference (GWC)

This paper introduces the motivation for and design of the Collaborative InterLingual Index (CILI). It is designed to make possible coordination between multiple loosely coupled wordnet projects. The structure of the CILI is based on the Interlingual index first proposed in the EuroWordNet project with several pragmatic extensions: an explicit open license, definitions in English and links to wordnets in the Global Wordnet Grid.

Multilingual Sense Intersection in a Parallel Corpus with Diverse Language Families
Giulia Bonansinga | Francis Bond
Proceedings of the 8th Global WordNet Conference (GWC)

Supervised methods for Word Sense Disambiguation (WSD) benefit from high-quality sense-annotated resources, which are lacking for many languages less common than English. There are, however, several multilingual parallel corpora that can be inexpensively annotated with senses through cross-lingual methods. We test the effectiveness of such an approach by attempting to disambiguate English texts through their translations in Italian, Romanian and Japanese. Specifically, we try to find the appropriate word senses for the English words by comparison with all the word senses associated to their translations. The main advantage of this approach is in that it can be applied to any parallel corpus, as long as large, high-quality inter-linked sense inventories exist for all the languages considered.

Toward a truly multilingual GlobalWordnet Grid
Piek Vossen | Francis Bond | John P. McCrae
Proceedings of the 8th Global WordNet Conference (GWC)

In this paper, we describe a new and improved Global Wordnet Grid that takes advantage of the Collaborative InterLingual Index (CILI). Currently, the Open Multilingal Wordnet has made many wordnets accessible as a single linked wordnet, but as it used the Princeton Wordnet of English (PWN) as a pivot, it loses concepts that are not part of PWN. The technical solution to this, a central registry of concepts, as proposed in the EuroWordnet project through the InterLingual Index, has been known for many years. However, the practical issues of how to host this index and who decides what goes in remained unsolved. Inspired by current practice in the Semantic Web and the Linked Open Data community, we propose a way to solve this issue. In this paper we define the principles and protocols for contributing to the Grid. We tested them on two use cases, adding version 3.1 of the Princeton WordNet to a CILI based on 3.0 and adding the Open Dutch Wordnet, to validate the current set up. This paper aims to be a call for action that we hope will be further discussed and ultimately taken up by the whole wordnet community.

Mapping and Generating Classifiers using an Open Chinese Ontology
Luis Morgado Da Costa | Francis Bond | Helena Gao
Proceedings of the 8th Global WordNet Conference (GWC)

In languages such as Chinese, classifiers (CLs) play a central role in the quantification of noun-phrases. This can be a problem when generating text from input that does not specify the classifier, as in machine translation (MT) from English to Chinese. Many solutions to this problem rely on dictionaries of noun-CL pairs. However, there is no open large-scale machine-tractable dictionary of noun-CL associations. Many published resources exist, but they tend to focus on how a CL is used (e.g. what kinds of nouns can be used with it, or what features seem to be selected by each CL). In fact, since nouns are open class words, producing an exhaustive definite list of noun-CL associations is not possible, since it would quickly get out of date. Our work tries to address this problem by providing an algorithm for automatic building of a frequency based dictionary of noun-CL pairs, mapped to concepts in the Chinese Open Wordnet (Wang and Bond, 2013), an open machine-tractable dictionary for Chinese. All results will released under an open license.

Identifying and Exploiting Definitions in Wordnet Bahasa
David Moeljadi | Francis Bond
Proceedings of the 8th Global WordNet Conference (GWC)

This paper describes our attempts to add Indonesian definitions to synsets in the Wordnet Bahasa (Nurril Hirfana Mohamed Noor et al., 2011; Bond et al., 2014), to extract semantic relations between lemmas and definitions for nouns and verbs, such as synonym, hyponym, hypernym and instance hypernym, and to generally improve Wordnet. The original, somewhat noisy, definitions for Indonesian came from the Asian Wordnet project (Riza et al., 2010). The basic method of extracting the relations is based on Bond et al. (2004). Before the relations can be extracted, the definitions were cleaned up and tokenized. We found that the definitions cannot be completely cleaned up because of many misspellings and bad translations. However, we could identify four semantic relations in 57.10% of noun and verb definitions. For the remaining 42.90%, we propose to add 149 new Indonesian lemmas and make some improvements to Wordnet Bahasa and Wordnet in general.

2015

Passive and Pervasive Use of Bilingual Dictionary in Statistical Machine Translation
Liling Tan | Josef van Genabith | Francis Bond
Proceedings of the Fourth Workshop on Hybrid Approaches to Translation (HyTra)

An HPSG-based Shared-Grammar for the Chinese Languages: ZHONG [|]
Zhenzhen Fan | Sanghoun Song | Francis Bond
Proceedings of the Grammar Engineering Across Frameworks (GEAF) 2015 Workshop

Building an HPSG-based Indonesian Resource Grammar (INDRA)
David Moeljadi | Francis Bond | Sanghoun Song
Proceedings of the Grammar Engineering Across Frameworks (GEAF) 2015 Workshop

OMWEdit - The Integrated Open Multilingual Wordnet Editing System
Luís Morgado da Costa | Francis Bond
Proceedings of ACL-IJCNLP 2015 System Demonstrations

IMI — A Multilingual Semantic Annotation Environment
Francis Bond | Luís Morgado da Costa | Tuấn Anh Lê
Proceedings of ACL-IJCNLP 2015 System Demonstrations

2014

Issues in building English-Chinese parallel corpora with WordNets.
Francis Bond | Shan Wang
Proceedings of the Seventh Global Wordnet Conference

A Survey of WordNet Annotated Corpora
Tommaso Petrolito | Francis Bond
Proceedings of the Seventh Global Wordnet Conference

Parse Ranking with Semantic Dependencies and WordNet
Xiaocheng Yin | Jung-Jae Kim | Zinaida Pozen | Francis Bond
Proceedings of the Seventh Global Wordnet Conference

Bringing together over- and under- represented languages: Linking WordNet to the SIL Semantic Domains
Muhammad Zulhelmy bin Mohd Rosman | František Kratochvíl | Francis Bond
Proceedings of the Seventh Global Wordnet Conference

Sensible: L2 Translation Assistance by Emulating the Manual Post-Editing Process
Liling Tan | Anne-Kathrin Schumann | Jose M.M. Martinez | Francis Bond
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

Building The Sense-Tagged Multilingual Parallel Corpus
Shan Wang | Francis Bond
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Sense-annotated parallel corpora play a crucial role in natural language processing. This paper introduces our progress in creating such a corpus for Asian languages using English as a pivot, which is the first such corpus for these languages. Two sets of tools have been developed for sequential and targeted tagging, which are also easy to set up for any new language in addition to those we are annotating. This paper also briefly presents the general guidelines for doing this project. The current results of monolingual sense-tagging and multilingual linking are illustrated, which indicate the differences among genres and language pairs. All the tools, guidelines and the manually annotated corpus will be freely available at compling.ntu.edu.sg/ntumc.

Identifying Idioms in Chinese Translations
Wan Yu Ho | Christine Kng | Shan Wang | Francis Bond
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Optimally, a translated text should preserve information while maintaining the writing style of the original. When this is not possible, as is often the case with figurative speech, a common practice is to simplify and make explicit the implications. However, in our investigations of translations from English to another language, English-to-Chinese texts were often found to include idiomatic expressions (usually in the form of Chengyu æè ̄) where there were originally no idiomatic, metaphorical, or even figurative expressions. We have created an initial small lexicon of Chengyu, with which we can use to find all occurrences of Chengyu in a given corpus, and will continue to expand the database. By examining the rates and patterns of occurrence across four genres in the NTU Multilingual Corpus, a resource may be created to aid machine translation or, going further, predict Chinese translational trends in any given genre.

NTU-MC Toolkit: Annotating a Linguistically Diverse Corpus
Liling Tan | Francis Bond
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: System Demonstrations

2013

Building the Chinese Open Wordnet (COW): Starting from Core Synsets
Shan Wang | Francis Bond
Proceedings of the 11th Workshop on Asian Language Resources

Developing Parallel Sense-tagged Corpora with Wordnets
Francis Bond | Shan Wang | Eshley Huini Gao | Hazel Shuwen Mok | Jeanette Yiwen Tan
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse

XLING: Matching Query Sentences to a Parallel Corpus using Topic Models for WSD
Liling Tan | Francis Bond
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

Linking and Extending an Open Multilingual Wordnet
Francis Bond | Ryan Foster
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2012

Comparing Classifier use in Chinese and Japanese
Yue Hui Ting | Francis Bond
Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation

Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation
Ruli Manurung | Francis Bond
Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation

Extracting Semantic Transfer Rules from Parallel Corpora with SMT Phrase Aligners
Petter Haugereid | Francis Bond
Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation

Enriching Parallel Corpora for Statistical Machine Translation with Semantic Negation Rephrasing
Dominikus Wetzel | Francis Bond
Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation

Segmentation and Translation of Japanese Multi-word Loanwords
James Breen | Timothy Baldwin | Francis Bond
Proceedings of the Australasian Language Technology Association Workshop 2012

Cross-lingual Parse Disambiguation based on Semantic Correspondence
Lea Frermann | Francis Bond
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2011

Building and Annotating the Linguistically Diverse NTU-MC (NTU-Multilingual Corpus)
Liling Tan | Francis Bond
Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation

Creating the Open Wordnet Bahasa
Nurril Hirfana Bte Mohamed Noor | Suerya Sapuan | Francis Bond
Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation

Extracting Transfer Rules for Multiword Expressions from Parallel Corpora
Petter Haugereid | Francis Bond
Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World

Book Review: Language, Technology, and Society by Richard Sproat
Francis Bond
Computational Linguistics, Volume 37, Issue 1 - March 2011

2010

Development of the Korean Resource Grammar: Towards Grammar Customization
Sanghoun Song | Jong-Bok Kim | Francis Bond | Jaehyung Yang
Proceedings of the Eighth Workshop on Asian Language Resouces

2009

Enhancing the Japanese WordNet
Francis Bond | Hitoshi Isahara | Sanae Fujita | Kiyotaka Uchimoto | Takayuki Kuribayashi | Kyoko Kanzaki
Proceedings of the 7th Workshop on Asian Language Resources (ALR7)

Online Search Interface for the Sejong Korean-Japanese Bilingual Corpus and Auto-interpolation of Phrase Alignment
Sanghoun Song | Francis Bond
Proceedings of the Third Linguistic Annotation Workshop (LAW III)

Using Generation for Grammar Analysis and Error Detection
Michael Goodman | Francis Bond
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

Hypernym Discovery Based on Distributional Similarity and Hierarchical Structures
Ichiro Yamada | Kentaro Torisawa | Jun’ichi Kazama | Kow Kuroda | Masaki Murata | Stijn De Saeger | Francis Bond | Asuka Sumida
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

2008

Extraction of Attribute Concepts from Japanese Adjectives
Kyoko Kanzaki | Francis Bond | Noriko Tomuro | Hitoshi Isahara
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We describe various syntactic and semantic conditions for finding abstractnouns which refer to concepts of adjectives from a text, in an attempt to explore the creation of a thesaurus from text. Depending on usages, six kinds of syntactic patterns are shown. In the syntactic and semantic conditions an omission of an abstract noun is mainly used, but in addition, various linguistic clues are needed. We then compare our results with synsets of Japanese WordNet. From a viewpoint of Japanese WordNet, the degree of agreement of ?Attribute? between our data and Japanese WordNet was 22%. On the other hand, the total number of differences of obtained abstract nouns was 267. From a viewpoint of our data,the degree of agreement of abstract nouns between our data and Japanese WordNet was 54%.

Development of the Japanese WordNet
Hitoshi Isahara | Francis Bond | Kiyotaka Uchimoto | Masao Utiyama | Kyoko Kanzaki
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

After a long history of compilation of our own lexical resources, EDR Japanese/English Electronic Dictionary, and discussions with major players on development of various WordNets, Japanese National Institute of Information and Communications Technology started developing the Japanese WordNet in 2006 and will publicly release the first version, which includes both the synset in Japanese and the annotated Japanese corpus of SemCor, in June 2008. As the first step in compiling the Japanese WordNet, we added Japanese equivalents to synsets of the Princeton WordNet. Of course, we must also add some synsets which do not exist in the Princeton WordNet, and must modify synsets in the Princeton WordNet, in order to make the hierarchical structure of Princeton synsets represent thesaurus-like information found in the Japanese language, however, we will address these tasks in a future study. We then translated English sentences which are used in the SemCor annotation into Japanese and annotated them using our Japanese WordNet. This article describes the overview of our project to compile Japanese WordNet and other resources which relate to our Japanese WordNet.

Boot-Strapping a WordNet Using Multiple Existing WordNets
Francis Bond | Hitoshi Isahara | Kyoko Kanzaki | Kiyotaka Uchimoto
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper we describe the construction of an illustrated Japanese Wordnet. We bootstrap the Wordnet using existing multiple existing wordnets in order to deal with the ambiguity inherent in translation. We illustrate it with pictures from the Open Clip Art Library.

MRD-based Word Sense Disambiguation: Further Extending Lesk
Timothy Baldwin | Su Nam Kim | Francis Bond | Sanae Fujita | David Martinez | Takaaki Tanaka
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II

Improving statistical machine translation by paraphrasing the training data.
Francis Bond | Eric Nichols | Darren Scott Appling | Michael Paul
Proceedings of the 5th International Workshop on Spoken Language Translation: Papers

Large amounts of training data are essential for training statistical machine translations systems. In this paper we show how training data can be expanded by paraphrasing one side. The new data is made by parsing then generating using a precise HPSG based grammar, which gives sentences with the same meaning, but minor variations in lexical choice and word order. In experiments with Japanese and English, we showed consistent gains on the Tanaka Corpus with less consistent improvement on the IWSLT 2005 evaluation data.

Sharing User Dictionaries Across Multiple Systems with UTX-S
Francis Bond | Seiji Okura | Yuji Yamamoto | Toshiki Murata | Kiyotaka Uchimoto | Michael Kato | Miwako Shimazu | Tsugiyoshi Suzuki
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Government and Commercial Uses of MT

Careful tuning of user-created dictionaries is indispensable when using a machine translation system for computer aided translation. However, there is no widely used standard for user dictionaries in the Japanese/English machine translation market. To address this issue, AAMT (the Asia-Pacific Association for Machine Translation) has established a specification of sharable dictionaries (UTX-S: Universal Terminology eXchange -- Simple), which can be used across different machine translation systems, thus increasing the interoperability of language resources. UTX-S is simpler than existing specifications such as UPF and OLIF. It was explicitly designed to make it easy to (a) add new user dictionaries and (b) share existing user dictionaries. This facilitates rapid user dictionary production and avoids vendor tie in. In this study we describe the UTX-Simple (UTX-S) format, and show that it can be converted to the user dictionary formats for five commercial English-Japanese MT systems. We then present a case study where we (a) convert an on-line glossary to UTX-S, and (b) produce user dictionaries for five different systems, and then exchange them. The results show that the simplified format of UTX-S can be used to rapidly build dictionaries. Further, we confirm that customized user dictionaries are effective across systems, although with a slight loss in quality: on average, user dictionaries improved the translations for 44.8% of translations with the systems they were built for and 37.3% of translations for different systems. In ongoing work, AAMT is using UTX-S as the format in building up a user community for producing, sharing, and accumulating user dictionaries in a sustainable way.

2007

Exploiting Semantic Information for HPSG Parse Selection
Sanae Fujita | Francis Bond | Stephan Oepen | Takaaki Tanaka
ACL 2007 Workshop on Deep Linguistic Processing

Word Sense Disambiguation Incorporating Lexical and Structural Semantic Information
Takaaki Tanaka | Francis Bond | Timothy Baldwin | Sanae Fujita | Chikara Hashimoto
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

Combining resources for open source machine translation
Eric Nichols | Francis Bond | Darren Scott Appling | Yuji Matsumoto
Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Papers

2006

Sentence Comparison Using Robust Minimal Recursion Semantics and an Ontology
Rebecca Dridan | Francis Bond
Proceedings of the Workshop on Linguistic Distances

The Hinoki Sensebank — A Large-Scale Word Sense Tagged Corpus of Japanese —
Takaaki Tanaka | Francis Bond | Sanae Fujita
Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006

Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006
Timothy Baldwin | Francis Bond | Adam Meyers | Shigeko Nariyama
Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006

Multilingual Ontology Acquisition from Multiple MRDs
Eric Nichols | Francis Bond | Takaaki Tanaka | Sanae Fujita | Dan Flickinger
Proceedings of the 2nd Workshop on Ontology Learning and Population: Bridging the Gap between Text and Knowledge

An Implemented Description of Japanese: The Lexeed Dictionary and the Hinoki Treebank
Sanae Fujita | Takaaki Tanaka | Francis Bond | Hiromi Nakaiwa
Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions

2005

High Precision Treebanking—Blazing Useful Trees Using POS Information
Takaaki Tanaka | Francis Bond | Stephan Oepen | Sanae Fujita
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

Integration of a Lexical Type Database with a Linguistically Interpreted Corpus
Chikara Hashimoto | Francis Bond | Takaaki Tanaka | Melanie Siegel
Proceedings of the Sixth International Workshop on Linguistically Interpreted Corpora (LINC-2005)

SEM-I Rational MT: Enriching Deep Grammars with a Semantic Interface for Scalable Machine Translation
Dan Flickinger | Jan Tore Lønning | Helge Dyvik | Stephan Oepen | Francis Bond
Proceedings of Machine Translation Summit X: Papers

In the LOGON machine translation system where semantic transfer using Minimal Recursion Semantics is being developed in conjunction with two existing broad-coverage grammars of Norwegian and English, we motivate the use of a grammar-specific semantic interface (SEM-I) to facilitate the construction and maintenance of a scalable translation engine. The SEM-I is a theoretically grounded component of each grammar, capturing several classes of lexical regularities while also serving the crucial engineering function of supplying a reliable and complete specification of the elementary predications the grammar can realize. We make extensive use of underspecification and type hierarchies to maximize generality and precision.

Extracting Representative Arguments from Dictionaries for Resolving Zero Pronouns
Shigeko Nariyama | Eric Nichols | Francis Bond | Takaaki Tanaka | Hiromi Nakaiwa
Proceedings of Machine Translation Summit X: Papers

We propose a method to alleviate the problem of referential granularity for Japanese zero pronoun resolution. We use dictionary definition sentences to extract ‘representative’ arguments of predicative definition words; e.g. ‘arrest’ is likely to take police as the subject and criminal as its object. These representative arguments are far more informative than ‘person’ that is provided by other valency dictionaries. They are auto-extracted using both Shallow parsing and Deep parsing for greater quality and quantity. Initial results are highly promising, obtaining more specific information about selectional preferences. An architecture of zero pronoun resolution using these representative arguments is described.

Open Source Machine Translation with DELPH-IN
Francis Bond | Stephan Oepen | Melanie Siegel | Ann Copestake | Dan Flickinger
Workshop on open-source machine translation

2004

A Method of Creating New Bilingual Valency Entries using Alternations
Sanae Fujita | Francis Bond
Proceedings of the Workshop on Multilingual Linguistic Resources

The Hinoki Treebank. Working Toward Text Understanding
Francis Bond | Sanae Fujita | Chikara Hashimoto | Kaname Kasahara | Shigeko Nariyama | Eric Nichols | Akira Ohtani | Takaaki Tanaka | Shigeaki Amano
Proceedings of the 5th International Workshop on Linguistically Interpreted Corpora

A Lexicon Module for a Grammar Development Environment
Ann Copestake | Fabre Lambeau | Benjamin Waldron | Francis Bond | Dan Flickinger | Stephan Oepen
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

Acquiring an Ontology for a Fundamental Vocabulary
Francis Bond | Eric Nichols | Sanae Fujita | Takaaki Tanaka
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

An automatic method of creating valency entries using plain bilingual dictionaries
Sanae Fujita | Francis Bond
Proceedings of the 10th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages

2003

A Plethora of Methods for Learning English Countability
Timothy Baldwin | Francis Bond
Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing

Learning the Countability of English Nouns from Corpus Data
Timothy Baldwin | Francis Bond
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

Evaluation of a method of creating new valency entries
Francis Bond | Sanae Fujita
Proceedings of Machine Translation Summit IX: Papers

Information on subcategorization and selectional restrictions is important for natural language processing tasks such as deep parsing, rule-based machine translation and automatic summarization. In this paper we present a method of adding detailed entries to a bilingual dictionary, based on information in an existing valency dictionary. The method is based on two assumptions: words with similar meaning have similar subcategorization frames and selectional restrictions; and words with the same translations have similar meanings. Based on these assumptions, new valency entries are constructed from words in a plain bilingual dictionary, using entries with similar source-language meaning and the same target-language translations. We evaluate the effects of various measures of similarity in increasing accuracy.

2002

Extending the Coverage of a Valency Dictionary
Sanae Fujita | Francis Bond
COLING-02: Machine Translation in Asia

Multiword expressions: linguistic precision and reusability
Ann Copestake | Fabre Lambeau | Aline Villavicencio | Francis Bond | Timothy Baldwin | Ivan A. Sag | Dan Flickinger
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

Towards a Thesaurus of Predicates
Satoshi Shirai | Kazuhide Yamamoto | Francis Bond | Hozumi Tanaka
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

Using an Ontology to Determine English Countability
Francis Bond | Caitlin Vatikiotis-Bateson
COLING 2002: The 19th International Conference on Computational Linguistics

Toward a science of machine translation
Francis Bond
Workshop on machine translation roadmap

A method of adding new entries to a valency dictionary by exploiting existing lexical resources
Sanae Fujita | Francis Bond
Proceedings of the 9th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Papers

Alternation-based lexicon reconstruction
Timothy Baldwin | Francis Bond
Proceedings of the 9th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Papers

2001

Design and construction of a machine-tractable Japanese-Malay dictionary
Francis Bond | Ruhaida Binti Sulong | Takefumi Yamazaki | Kentaro Ogura
Proceedings of Machine Translation Summit VIII

We present a method for combining two bilingual dictionaries to make a third, using one language as a pivot. In this case we combine a Japanese-English dictionary with a Malay-English dictionary, to produce a Japanese-Malay dictionary suitable for use in a machine translation system. Our method differs from previous methods in its use of semantic classes to rank translation equivalents: word pairs with compatible semantic classes are preferred to those with dissimilar classes. We also experiment with the use of two pivot languages. We have made a prototype dictionary of over 75,000 pairs.

2000

Semantic Annotation of a Japanese Speech Corpus
John Fry | Francis Bond
Proceedings of the COLING-2000 Workshop on Semantic Annotation and Intelligent Content

Memory-Based Learning for Article Generation
Guido Minnen | Francis Bond | Ann Copestake
Fourth Conference on Computational Natural Language Learning and the Second Learning Language in Logic Workshop

Reusing an ontology to generate numeral classifiers
Francis Bond | Kyonghee Paik
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics

1999

A valency dictionary architecture for Machine Translation
Timothy Baldwin | Francis Bond | Ben Hutchinson
Proceedings of the 8th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages

ALT-J/M a prototype Japanese-to-Malay translation system
Kentaro Ogura | Francis Bond | Yoshifumi Ooyama
Proceedings of Machine Translation Summit VII

In this report we introduce ALT-J/M: a prototype Japanese-to-Malay translation system. The system is a semantic transfer based system that uses the same translation engine as ALT-J/E, a Japanese-to-English system.

1998

Anchoring Floating Quantifiers in Japanese-to-English Machine Translation
Francis Bond | Daniela Kurz | Satoshi Shirai
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1

Anchoring Floating Quantifiers in Japanese-to-English Machine Translation
Francis Bond | Daniela Kurz | Satoshi Shirai
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics

1997

Temporal expressions in Japanese-to-English machine translation
Francis Bond | Kentaro Ogura | Hajime Uchino
Proceedings of the 7th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages

English adverb processing in Japanese-to-English machine translation
Kentaro Ogura | Satoshi Shirai | Francis Bond
Proceedings of the 7th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages

1996

Classifiers in Japanese-to-English Machine Translation
Francis Bond | Kentaro Ogura | Satoru Ikehara
COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics

1995

Noun Phrase Reference in Japanese to English Machine Translation
Francis Bond | Kentaro Ogura | Tsukasa Kawaoka
Proceedings of the Sixth Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages

1994

Countability and Number in Japanese to English Machine Translation
Francis Bond | Kentaro Ogura | Satoru Ikehara
COLING 1994 Volume 1: The 15th International Conference on Computational Linguistics

English Adverb Generation in Japanese to English Machine Translation
Kentaro Ogura | Francis Bond | Satoru Ikehara
Fourth Conference on Applied Natural Language Processing

Automatic Aquisition of Semantic Attributes for User Defined Words m Japanese to English Machine Translation
Satoru Ikehara | Satoshi Shirai | Akio Yokoo | Francis Bond | Yoshie Omi
Fourth Conference on Applied Natural Language Processing

Co-authors

Kentaro Ogura 8

Dan Flickinger 7

Alexandre Rademaker 6

Michael Wayne Goodman 5

Stephan Oepen 5

Satoshi Shirai 5

Ann Copestake 4

Satoru Ikehara 4

Hitoshi Isahara 4

Kyoko Kanzaki 4

Sanghoun Song 4

Kiyotaka Uchimoto 4

Christiane Fellbaum 3

Chikara Hashimoto 3

Takayuki Kuribayashi 3

Shigeko Nariyama 3

Melanie Siegel 3

Darren Scott Appling 2

Siew Yeng Chow 2

Carlos Gómez-Rodríguez 2

Petter Haugereid 2

Arkadiusz Janz 2

František Kratochvíl 2

Fabre Lambeau 2

Joseph Mackinnon 2

Rowan Hall Maudslay 2

Marek Maziarz 2

David Moeljadi 2

Hiromi Nakaiwa 2

Maciej Piasecki 2

James Pustejovsky 2

Roger V. P. Winder 2

Olga Zamaraeva 2

Josef van Genabith 2

Shigeaki Amano 1

Verginica Barbu Mititelu 1

Andrew Bennett 1

Erica Biagetti 1

Giulia Bonansinga 1

Nicoletta Calzolari 1

Hsin-Hsi Chen 1

Christian Chiarcos 1

Merrick Yeu Herng Choo 1

Philipp Cimiano 1

Gerard De Melo 1

Stijn De Saeger 1

Thierry Declerck 1

Benidiktus Delpada 1

Andrew Devadason 1

Lucia Donatelli 1

Rebecca Dridan 1

Eshley Huini Gao 1

Helena Hong Gao 1

Łukasz Grabowski 1

Younggyun Hahm 1

Carmel Lee Hah Heah 1

Sebastian Hellmann 1

Chu-Ren Huang 1

Aliaksandr Huminski 1

Ben Hutchinson 1

Kaname Kasahara 1

Tsukasa Kawaoka 1

Jun’ichi Kazama 1

Bettina Klimek 1

Christine Kng 1

Sadao Kurohashi 1

Daniel Simon Lanma 1

Tony Kyungil Lee 1

Gina-Anne Levow 1

Kamila Liedermannova 1

Benedict Christopher Tzer Liang Lin 1

Benedict Christopher Lin Tzer Liang 1

Jan Tore Lønning 1

Ruli Manurung 1

David Martinez Iraola 1

José Manuel Martínez 1

Yuji Matsumoto 1

Graham Matthews 1

Diana McCarthy 1

Jelena Mitrović 1

Hazel Shuwen Mok 1

Masaki Murata 1

Toshiki Murata 1

Seung-Hoon Na 1

Hiroki Nomoto 1

Nurril Hirfana Bte Mohamed Noor 1

Yoshifumi Ooyama 1

Petya Osenova 1

Patrizia Paggio 1

Kyonghee Paik 1

Antonio Pareja Lora 1

Carla Parra Escartín 1

Tommaso Petrolito 1

Tadeusz Piotrowski 1

Jonathan Pool 1

Zinaida Pozen 1

Muhammad Zulhelmy bin Mohd Rosman 1

Roi Santos-Rios 1

Enrico Santus 1

Suerya Sapuan 1

Anne-Kathrin Schumann 1

Selja Seppälä 1

Miwako Shimazu 1

Chang-Uk Shin 1

Joanna Ut-Seong Sio 1

Laura Slaughter 1

Ruhaida Binti Sulong 1

Tsugiyoshi Suzuki 1

Jeanette Yiwen Tan 1

Hozumi Tanaka 1

Melissa Rui Lin Teo 1

Simone Teufel 1

Noriko Tomuro 1

Kentaro Torisawa 1

Hajime Uchino 1

Masao Utiyama 1

E Umamaheswari Vasanthakumar 1

Caitlin Vatikiotis-Bateson 1

Luca Brigada Villa 1

Aline Villavicencio 1

Benjamin Waldron 1

Dominikus Wetzel 1

Roger Vivek Placidus Winder 1

Natálie Wolfová 1

Ichiro Yamada 1

Kazuhide Yamamoto 1

Yuji Yamamoto 1

Takefumi Yamazaki 1

Jaehyung Yang 1

Xiaocheng Yin 1

Chiara Zanchi 1

Venues