MultiFix: Learning to Repair Multiple Errors by Optimal Alignment Learning
Findings of the Association for Computational Linguistics: EMNLP 2021
We consider the problem of learning to repair erroneous C programs by learning optimal alignments with correct programs. Since the previous approaches fix a single error in a line, it is inevitable to iterate the fixing process until no errors remain. In this work, we propose a novel sequence-to-sequence learning framework for fixing multiple program errors at a time. We introduce the edit-distance-based data labeling approach for program error correction. Instead of labeling a program repair example by pairing an erroneous program with a line fix, we label the example by paring an erroneous program with an optimal alignment to the corresponding correct program produced by the edit-distance computation. We evaluate our proposed approach on a publicly available dataset (DeepFix dataset) that consists of erroneous C programs submitted by novice programming students. On a set of 6,975 erroneous C programs from the DeepFix dataset, our approach achieves the state-of-the-art result in terms of full repair rate on the DeepFix dataset (without extra data such as compiler error message or additional source codes for pre-training).
SoftRegex: Generating Regex from Natural Language Descriptions using Softened Regex Equivalence
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
We continue the study of generating se-mantically correct regular expressions from natural language descriptions (NL). The current state-of-the-art model SemRegex produces regular expressions from NLs by rewarding the reinforced learning based on the semantic (rather than syntactic) equivalence between two regular expressions. Since the regular expression equivalence problem is PSPACE-complete, we introduce the EQ_Reg model for computing the simi-larity of two regular expressions using deep neural networks. Our EQ_Reg mod-el essentially softens the equivalence of two regular expressions when used as a reward function. We then propose a new regex generation model, SoftRegex, us-ing the EQ_Reg model, and empirically demonstrate that SoftRegex substantially reduces the training time (by a factor of at least 3.6) and produces state-of-the-art results on three benchmark datasets.