Corpus and dictionary development for classifiers/quantifiers towards a French-Japanese machine translation

Mutsuko Tomokiyo, Christian Boitet


Abstract
Although quantifiers/classifiers expressions occur frequently in everyday communications or written documents, there is no description for them in classical bilingual paper dictionaries, nor in machine-readable dictionaries. The paper describes a corpus and dictionary development for quantifiers/classifiers, and their usage in the framework of French-Japanese machine translation (MT). They often cause problems of lexical ambiguity and of set phrase recognition during analysis, in particular for a long-distance language pair like French and Japanese. For the development of a dictionary aiming at ambiguity resolution for expressions including quantifiers and classifiers which may be ambiguous with common nouns, we have annotated our corpus with UWs (interlingual lexemes) of UNL (Universal Networking Language) found on the UNL-jp dictionary. The extraction of potential classifiers/quantifiers from corpus is made by UNLexplorer web service. Keywords : classifiers, quantifiers, phraseology study, corpus annotation, UNL (Universal Networking Language), UWs dictionary, Tori Bank, French-Japanese machine translation (MT).
Anthology ID:
W16-5324
Volume:
Proceedings of the 5th Workshop on Cognitive Aspects of the Lexicon (CogALex - V)
Month:
December
Year:
2016
Address:
Osaka, Japan
Venues:
CogALex | WS
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
185–192
Language:
URL:
https://aclanthology.org/W16-5324
DOI:
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/W16-5324.pdf