Yow-Ting Shiue


2023

pdf bib
Augmenters at SemEval-2023 Task 1: Enhancing CLIP in Handling Compositionality and Ambiguity for Zero-Shot Visual WSD through Prompt Augmentation and Text-To-Image Diffusion
Jie Li | Yow-Ting Shiue | Yong-Siang Shih | Jonas Geiping
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper describes our zero-shot approachesfor the Visual Word Sense Disambiguation(VWSD) Task in English. Our preliminarystudy shows that the simple approach of match-ing candidate images with the phrase usingCLIP suffers from the many-to-many natureof image-text pairs. We find that the CLIP textencoder may have limited abilities in captur-ing the compositionality in natural language. Conversely, the descriptive focus of the phrasevaries from instance to instance. We addressthese issues in our two systems, Augment-CLIPand Stable Diffusion Sampling (SD Sampling).Augment-CLIP augments the text prompt bygenerating sentences that contain the contextphrase with the help of large language mod-els (LLMs). We further explore CLIP modelsin other languages, as the an ambiguous wordmay be translated into an unambiguous one inthe other language. SD Sampling uses text-to-image Stable Diffusion to generate multipleimages from the given phrase, increasing thelikelihood that a subset of images match theone that paired with the text.

2021

pdf bib
Time-Aware Ancient Chinese Text Translation and Inference
Ernie Chang | Yow-Ting Shiue | Hui-Syuan Yeh | Vera Demberg
Proceedings of the 2nd International Workshop on Computational Approaches to Historical Language Change 2021

In this paper, we aim to address the challenges surrounding the translation of ancient Chinese text: (1) The linguistic gap due to the difference in eras results in translations that are poor in quality, and (2) most translations are missing the contextual information that is often very crucial to understanding the text. To this end, we improve upon past translation techniques by proposing the following: We reframe the task as a multi-label prediction task where the model predicts both the translation and its particular era. We observe that this helps to bridge the linguistic gap as chronological context is also used as auxiliary information. We validate our framework on a parallel corpus annotated with chronology information and show experimentally its efficacy in producing quality translation outputs. We release both the code and the data for future research.

2020

pdf bib
The University of Maryland’s Submissions to the WMT20 Chat Translation Task: Searching for More Data to Adapt Discourse-Aware Neural Machine Translation
Calvin Bao | Yow-Ting Shiue | Chujun Song | Jie Li | Marine Carpuat
Proceedings of the Fifth Conference on Machine Translation

This paper describes the University of Maryland’s submissions to the WMT20 Shared Task on Chat Translation. We focus on translating agent-side utterances from English to German. We started from an off-the-shelf BPE-based standard transformer model trained with WMT17 news and fine-tuned it with the provided in-domain training data. In addition, we augment the training set with its best matches in the WMT19 news dataset. Our primary submission uses a standard Transformer, while our contrastive submissions use multi-encoder Transformers to attend to previous utterances. Our primary submission achieves 56.7 BLEU on the agent side (en→de), outperforming a baseline system provided by the task organizers by more than 13 BLEU points. Moreover, according to an evaluation on a set of carefully-designed examples, the multi-encoder architecture is able to generate more coherent translations.

pdf bib
MSD-1030: A Well-built Multi-Sense Evaluation Dataset for Sense Representation Models
Ting-Yu Yen | Yang-Yin Lee | Yow-Ting Shiue | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the Twelfth Language Resources and Evaluation Conference

Sense embedding models handle polysemy by giving each distinct meaning of a word form a separate representation. They are considered improvements over word models, and their effectiveness is usually judged with benchmarks such as semantic similarity datasets. However, most of these datasets are not designed for evaluating sense embeddings. In this research, we show that there are at least six concerns about evaluating sense embeddings with existing benchmark datasets, including the large proportions of single-sense words and the unexpected inferior performance of several multi-sense models to their single-sense counterparts. These observations call into serious question whether evaluations based on these datasets can reflect the sense model’s ability to capture different meanings. To address the issues, we propose the Multi-Sense Dataset (MSD-1030), which contains a high ratio of multi-sense word pairs. A series of analyses and experiments show that MSD-1030 serves as a more reliable benchmark for sense embeddings. The dataset is available at http://nlg.csie.ntu.edu.tw/nlpresource/MSD-1030/.

2018

pdf bib
GenSense: A Generalized Sense Retrofitting Model
Yang-Yin Lee | Ting-Yu Yen | Hen-Hsen Huang | Yow-Ting Shiue | Hsin-Hsi Chen
Proceedings of the 27th International Conference on Computational Linguistics

With the aid of recently proposed word embedding algorithms, the study of semantic similarity has progressed and advanced rapidly. However, many natural language processing tasks need sense level representation. To address this issue, some researches propose sense embedding learning algorithms. In this paper, we present a generalized model from existing sense retrofitting model. The generalization takes three major components: semantic relations between the senses, the relation strength and the semantic strength. In the experiment, we show that the generalized model can outperform previous approaches in three types of experiment: semantic relatedness, contextual word similarity and semantic difference.

pdf bib
Correcting Chinese Word Usage Errors for Learning Chinese as a Second Language
Yow-Ting Shiue | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the 27th International Conference on Computational Linguistics

With more and more people around the world learning Chinese as a second language, the need of Chinese error correction tools is increasing. In the HSK dynamic composition corpus, word usage error (WUE) is the most common error type. In this paper, we build a neural network model that considers both target erroneous token and context to generate a correction vector and compare it against a candidate vocabulary to propose suitable corrections. To deal with potential alternative corrections, the top five proposed candidates are judged by native Chinese speakers. For more than 91% of the cases, our system can propose at least one acceptable correction within a list of five candidates. To the best of our knowledge, this is the first research addressing general-type Chinese WUE correction. Our system can help non-native Chinese learners revise their sentences by themselves.

pdf bib
A Chinese Writing Correction System for Learning Chinese as a Foreign Language
Yow-Ting Shiue | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations

We present a Chinese writing correction system for learning Chinese as a foreign language. The system takes a wrong input sentence and generates several correction suggestions. It also retrieves example Chinese sentences with English translations, helping users understand the correct usages of certain grammar patterns. This is the first available Chinese writing error correction system based on the neural machine translation framework. We discuss several design choices and show empirical results to support our decisions.

pdf bib
NTU NLP Lab System at SemEval-2018 Task 10: Verifying Semantic Differences by Integrating Distributional Information and Expert Knowledge
Yow-Ting Shiue | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper presents the NTU NLP Lab system for the SemEval-2018 Capturing Discriminative Attributes task. Word embeddings, pointwise mutual information (PMI), ConceptNet edges and shortest path lengths are utilized as input features to build binary classifiers to tell whether an attribute is discriminative for a pair of concepts. Our neural network model reaches about 73% F1 score on the test set and ranks the 3rd in the task. Though the attributes to deal with in this task are all visual, our models are not provided with any image data. The results indicate that visual information can be derived from textual data.

2017

pdf bib
Detection of Chinese Word Usage Errors for Non-Native Chinese Learners with Bidirectional LSTM
Yow-Ting Shiue | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Selecting appropriate words to compose a sentence is one common problem faced by non-native Chinese learners. In this paper, we propose (bidirectional) LSTM sequence labeling models and explore various features to detect word usage errors in Chinese sentences. By combining CWINDOW word embedding features and POS information, the best bidirectional LSTM model achieves accuracy 0.5138 and MRR 0.6789 on the HSK dataset. For 80.79% of the test data, the model ranks the ground-truth within the top two at position level.

2016

pdf bib
Detecting Word Usage Errors in Chinese Sentences for Learning Chinese as a Foreign Language
Yow-Ting Shiue | Hsin-Hsi Chen
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Automated grammatical error detection, which helps users improve their writing, is an important application in NLP. Recently more and more people are learning Chinese, and an automated error detection system can be helpful for the learners. This paper proposes n-gram features, dependency count features, dependency bigram features, and single-character features to determine if a Chinese sentence contains word usage errors, in which a word is written as a wrong form or the word selection is inappropriate. With marking potential errors on the level of sentence segments, typically delimited by punctuation marks, the learner can try to correct the problems without the assistant of a language teacher. Experiments on the HSK corpus show that the classifier combining all sets of features achieves an accuracy of 0.8423. By utilizing certain combination of the sets of features, we can construct a system that favors precision or recall. The best precision we achieve is 0.9536, indicating that our system is reliable and seldom produces misleading results.