Ho Hung Lim


2024

pdf bib
Improving Readability Assessment with Ordinal Log-Loss
Ho Hung Lim | John Lee
Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)

Automatic Readability Assessment (ARA) predicts the level of difficulty of a text, e.g. at Grade 1 to Grade 12. ARA is an ordinal classification task since the predicted levels follow an underlying order, from easy to difficult. However, most neural ARA models ignore the distance between the gold level and predicted level, treating all levels as independent labels. This paper investigates whether distance-sensitive loss functions can improve ARA performance. We evaluate a variety of loss functions on neural ARA models, and show that ordinal log-loss can produce statistically significant improvement over the standard cross-entropy loss in terms of adjacent accuracy in a majority of our datasets.

2022

pdf bib
Unsupervised Paraphrasability Prediction for Compound Nominalizations
John Sie Yuen Lee | Ho Hung Lim | Carol Webster
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Commonly found in academic and formal texts, a nominalization uses a deverbal noun to describe an event associated with its corresponding verb. Nominalizations can be difficult to interpret because of ambiguous semantic relations between the deverbal noun and its arguments. Automatic generation of clausal paraphrases for nominalizations can help disambiguate their meaning. However, previous work has not identified cases where it is awkward or impossible to paraphrase a compound nominalization. This paper investigates unsupervised prediction of paraphrasability, which determines whether the prenominal modifier of a nominalization can be re-written as a noun or adverb in a clausal paraphrase. We adopt the approach of overgenerating candidate paraphrases followed by candidate ranking with a neural language model. In experiments on an English dataset, we show that features from an Abstract Meaning Representation graph lead to statistically significant improvement in both paraphrasability prediction and paraphrase generation.

pdf bib
Robustness of Hybrid Models in Cross-domain Readability Assessment
Ho Hung Lim | Tianyuan Cai | John S. Y. Lee | Meichun Liu
Proceedings of the 20th Annual Workshop of the Australasian Language Technology Association

pdf bib
Automatic Nominalization of Clauses through Textual Entailment
John S. Y. Lee | Ho Hung Lim | Carol Webster | Anton Melser
Proceedings of the 29th International Conference on Computational Linguistics

Nominalization re-writes a clause as a noun phrase. It requires the transformation of the head verb of the clause into a deverbal noun, and the verb’s modifiers into nominal modifiers. Past research has focused on the selection of deverbal nouns, but has paid less attention to predicting the word positions and word forms for the nominal modifiers. We propose the use of a textual entailment model for clause nominalization. We obtained the best performance by fine-tuning a textual entailment model on this task, outperforming a number of unsupervised approaches using language model scores from a state-of-the-art neural language model.

2021

pdf bib
Paraphrasing Compound Nominalizations
John Lee | Ho Hung Lim | Carol Webster
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

A nominalization uses a deverbal noun to describe an event associated with its underlying verb. Commonly found in academic and formal texts, nominalizations can be difficult to interpret because of ambiguous semantic relations between the deverbal noun and its arguments. Our goal is to interpret nominalizations by generating clausal paraphrases. We address compound nominalizations with both nominal and adjectival modifiers, as well as prepositional phrases. In evaluations on a number of unsupervised methods, we obtained the strongest performance by using a pre-trained contextualized language model to re-rank paraphrase candidates identified by a textual entailment model.