Olivia O.Y. Kwong

Also published as: O.Y. Kwong, Oi Yee Kwong


2020

Existing dictionaries may help collocation translation by suggesting associated words in the form of collocations, thesaurus, and example sentences. We propose to enhance them with task-driven word associations, illustrating the need by a few scenarios and outlining a possible approach based on word embedding. An example is given, using pre-trained word embedding, while more extensive investigation with more refined methods and resources is underway.

2018

The Princeton WordNet for English was founded on the synonymy relation, and multilingual wordnets are primarily developed by creating equivalent synsets in the respective languages. The process would often rely on translation equivalents obtained from existing bilingual dictionaries. This paper discusses some observations from the Chinese Open Wordnet, especially from the adjective subnet, to illuminate potential blind spots of the approach which may lead to the formation of non-synsets in the new wordnet. With cross-linguistic differences duly taken into account, alternative representations of cross-lingual lexical relations are proposed to better capture the language-specific properties. It is also suggested that such cross-lingual representation encompassing the cognitive as well as linguistic aspects of meaning is beneficial for a lexical resource to be used by both humans and computers.
This study explores the use of natural language processing techniques to enhance bilingual lexical access beyond simple equivalents, to enable translators to navigate along a wider cross-lingual lexical space and more examples showing different translation strategies, which is essential for them to learn to produce not only faithful but also fluent translations.

2016

2015

2013

2012

2011

2010

This paper discusses our ongoing work on constructing an annotated corpus of children’s stories for further studies on the linguistic, computational, and cognitive aspects of story structure and understanding. Given its semantic nature and the need for extensive common sense and world knowledge, story understanding has been a notoriously difficult topic in natural language processing. In particular, the notion of story structure for maintaining coherence has received much attention, while its strong version in the form of story grammar has triggered much debate. The relation between discourse coherence and the interestingness, or the point, of a story has not been satisfactorily settled. Introspective analysis on story comprehension has led to some important observations, based on which we propose a preliminary annotation scheme covering the structural, functional, and emotional aspects connecting discourse segments in stories. The annotation process will shed light on how story structure interacts with story point via various linguistic devices, and the annotated corpus is expected to be a useful resource for computational discourse processing, especially for studying various issues regarding the interface between coherence and interestingness of stories.

2009

2008

2007

2006

In this paper, we propose a corpus-based approach to the construction of a Pan-Chinese lexical resource, starting out with the aim to enrich existing Chinese thesauri in the Pan-Chinese context. The resulting thesaurus is thus expected to contain not only the core senses and usages of Chinese lexical items but also usages specific to individual Chinese speech communities. We introduce the ideas behind the construction of the resource, outline the steps to be taken, and discuss some preliminary analyses. The work is backed up by a unique and large Chinese synchronous corpus containing textual data from various Chinese speech communities including Hong Kong, Beijing, Taipei and Singapore.

2005

2004

2003

2002

2001

This paper discusses the challenges which Chinese-English machine translation (MT) systems face in translating personal names. We show that the translation of names between Chinese and English is complicated by different factors, including orthographic, phonetic, geographic and social ones. Four existing systems were tested for their capability in translating personal names from Chinese to English. Test data embodying geographic and sociolinguistic differences were obtained from a synchronous Chinese corpus of news media texts. It is obvious that systems vary considerably in their ability to identify personal names in the source language and render them properly in the target language. Given the criticality of personal name translation to the overall intelligibility of a translated text, the coverage of personal names should be one of the important criteria in the evaluation of MT performance. Moreover, name translation, which calls for a hybrid approach, would remain a central issue to the future development of MT systems, especially for online and real-time applications.

1998