Yixin Huang


2024

pdf bib
LLM-as-a-Coauthor: Can Mixed Human-Written and Machine-Generated Text Be Detected?
Qihui Zhang | Chujie Gao | Dongping Chen | Yue Huang | Yixin Huang | Zhenyang Sun | Shilin Zhang | Weiye Li | Zhengyan Fu | Yao Wan | Lichao Sun
Findings of the Association for Computational Linguistics: NAACL 2024

With the rapid development and widespread application of Large Language Models (LLMs), the use of Machine-Generated Text (MGT) has become increasingly common, bringing with it potential risks, especially in terms of quality and integrity in fields like news, education, and science. Current research mainly focuses on purely MGT detection, without adequately addressing mixed scenarios including AI-revised Human-Written Text (HWT) or human-revised MGT. To tackle this challenge, we define mixtext, a form of mixed text involving both AI and human-generated content. Then we introduce MixSet, the first dataset dedicated to studying these mixtext scenarios. Leveraging MixSet, we executed comprehensive experiments to assess the efficacy of prevalent MGT detectors in handling mixtext situations, evaluating their performance in terms of effectiveness, robustness, and generalization. Our findings reveal that existing detectors struggle to identify mixtext, particularly in dealing with subtle modifications and style adaptability. This research underscores the urgent need for more fine-grain detectors tailored for mixtext, offering valuable insights for future research. Code and Models are available at https://github.com/Dongping-Chen/MixSet.

2023

pdf bib
NUS-IDS at PragTag-2023: Improving Pragmatic Tagging of Peer Reviews through Unlabeled Data
Sujatha Das Gollapalli | Yixin Huang | See-Kiong Ng
Proceedings of the 10th Workshop on Argument Mining

We describe our models for the Pragmatic Tagging of Peer Reviews Shared Task at the 10th Workshop on Argument Mining at EMNLP-2023. We trained multiple sentence classification models for the above competition task by employing various state-of-the-art transformer models that can be fine-tuned either in the traditional way or through instruction-based fine-tuning. Multiple model predictions on unlabeled data are combined to tentatively label unlabeled instances and augment the dataset to further improve performance on the prediction task. In particular, on the F1000RD corpus, we perform on-par with models trained on 100% of the training data while using only 10% of the data. Overall, on the competition datasets, we rank among the top-2 performers for the different data conditions.