Junghyun Min

2026

A Curious Class of Adpositional Multiword Expressions in Korean
Junghyun Min | Na-Rae Han | Jena D. Hwang | Nathan Schneider
Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026)

Multiword expressions (MWEs) have been widely studied in cross-lingual annotation frameworks such as PARSEME. However, Korean MWEs remain underrepresented in these efforts.In particular, Korean multiword adpositions lack systematic analysis, annotated resources, and integration into existing frameworks.In this paper, we present a study of Korean functional multiword expressions: postpositional verb-based constructions (PVCs).Using data from Korean Wikipedia, we survey and analyze several PVC expressions and contrast them from non-MWEs with similar structure.Building on this analysis, we propose annotation guidelines designed to support future work in Korean multiword adpositions and facilitate alignment with cross-lingual frameworks.

2025

pdf bib abs

Not ready for the bench: LLM legal interpretation is unstable and uncalibrated to human judgments
Abhishek Purushothama | Junghyun Min | Brandon Waldon | Nathan Schneider
Proceedings of the Natural Legal Language Processing Workshop 2025

Legal interpretation frequently involves assessing how a legal text, as understood by an ‘ordinary’ speaker of the language, applies to the set of facts characterizing a legal dispute. Recent scholarship has proposed that legal practitioners add large language models (LLMs) to their interpretive toolkit. This work offers an empirical argument against LLM-assisted interpretation as recently practiced by legal scholars and federal judges. Our investigation in English shows that models do not provide stable interpretive judgments and are susceptible to subtle variations in the prompt. While instruction tuning slightly improves model calibration to human judgments, even the best-calibrated LLMs remain weak predictors of human native speakers’ judgments.

pdf bib abs

Punctuation Restoration Improves Structure Understanding without Supervision
Junghyun Min | Minho Lee | Woochul Lee | Yeonsoo Lee
Proceedings of the 10th Workshop on Representation Learning for NLP (RepL4NLP-2025)

Unsupervised learning objectives like autoregressive and masked language modeling constitute a significant part in producing pre-trained representations that perform various downstream applications from natural language understanding to conversational tasks. However, despite impressive generative capabilities of recent large language models, their abilities to capture syntactic or semantic structure within text lag behind. We hypothesize that the mismatch between linguistic performance and competence in machines is attributable to insufficient learning of linguistic structure knowledge via currently popular pre-training objectives. Working with English, we show that punctuation restoration as a learning objective improves performance on structure-related tasks like named entity recognition, open information extraction, chunking, and part-of-speech tagging. Punctuation restoration results in ▲≥2%p improvement in 16 out of 18 experiments, across 6 out of 7 tasks. Our results show that punctuation restoration is an effective learning objective that can improve structure understanding and yield a more robust structure-aware representations of natural language in base-sized models.

pdf bib abs

When Does Meaning Backfire? Investigating the Role of AMRs in NLI
Junghyun Min | Xiulin Yang | Shira Wein
Proceedings of the 14th Joint Conference on Lexical and Computational Semantics (*SEM 2025)

Natural Language Inference (NLI) relies heavily on adequately parsing the semantic content of the premise and hypothesis.In this work, we investigate whether adding semantic information in the form of an Abstract Meaning Representation (AMR) helps pretrained language models better generalize in NLI. Our experiments integrating AMR into NLI in both fine-tuning and prompting settings show that the presence of AMR in fine-tuning hinders model generalization while prompting with AMR leads to slight gains in GPT-4o.However, an ablation study reveals that the improvement comes from amplifying surface-level differences rather than aiding semantic reasoning. This amplification can mislead models to predict non-entailment even when the core meaning is preserved.

pdf bib abs

Building UD Cairo for Old English in the Classroom
Lauren Levine | Junghyun Min | Amir Zeldes
Proceedings of the Eighth Workshop on Universal Dependencies (UDW, SyntaxFest 2025)

In this paper we present a sample treebank for Old English based on the UD Cairo sentences, collected and annotated as part of a classroom curriculum in Historical Linguistics. To collect the data, a sample of 20 sentences illustrating a range of syntactic constructions in the world’s languages, we employ a combination of LLM prompting and searches in authentic Old English data. For annotation we assigned sentences to multiple students with limited prior exposure to UD, whose annotations we compare and adjudicate. Our results suggest that while current LLM outputs in Old English do not reflect authentic syntax, this can be mitigated by post-editing, and that although beginner annotators do not possess enough background to complete the task perfectly, taken together they can produce good results and learn from the experience. We also conduct preliminary parsing experiments using Modern English training data, and find that although performance on Old English is poor, parsing on annotated features (lemma, hyperlemma, gloss) leads to improved performance.

2020

pdf bib abs

Syntactic Data Augmentation Increases Robustness to Inference Heuristics
Junghyun Min | R. Thomas McCoy | Dipanjan Das | Emily Pitler | Tal Linzen
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Pretrained neural models such as BERT, when fine-tuned to perform natural language inference (NLI), often show high accuracy on standard datasets, but display a surprising lack of sensitivity to word order on controlled challenge sets. We hypothesize that this issue is not primarily caused by the pretrained model’s limitations, but rather by the paucity of crowdsourced NLI examples that might convey the importance of syntactic structure at the fine-tuning stage. We explore several methods to augment standard training sets with syntactically informative examples, generated by applying syntactic transformations to sentences from the MNLI corpus. The best-performing augmentation method, subject/object inversion, improved BERT’s accuracy on controlled examples that diagnose sensitivity to word order from 0.28 to 0.73, without affecting performance on the MNLI test set. This improvement generalized beyond the particular construction used for data augmentation, suggesting that augmentation causes BERT to recruit abstract syntactic representations.

pdf bib abs

BERTs of a feather do not generalize together: Large variability in generalization across models with similar test set performance
R. Thomas McCoy | Junghyun Min | Tal Linzen
Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

If the same neural network architecture is trained multiple times on the same dataset, will it make similar linguistic generalizations across runs? To study this question, we fine-tuned 100 instances of BERT on the Multi-genre Natural Language Inference (MNLI) dataset and evaluated them on the HANS dataset, which evaluates syntactic generalization in natural language inference. On the MNLI development set, the behavior of all instances was remarkably consistent, with accuracy ranging between 83.6% and 84.8%. In stark contrast, the same models varied widely in their generalization performance. For example, on the simple case of subject-object swap (e.g., determining that “the doctor visited the lawyer” does not entail “the lawyer visited the doctor”), accuracy ranged from 0.0% to 66.2%. Such variation is likely due to the presence of many local minima in the loss surface that are equally attractive to a low-bias learner such as a neural network; decreasing the variability may therefore require models with stronger inductive biases.