Writing comprehensive commit messages is tedious yet important, because these messages describe changes of code, such as fixing bugs or adding new features. However, most existing methods focus on either only the changed lines or nearest context lines, without considering the effectiveness of selecting useful contexts. On the other hand, it is possible that introducing excessive contexts can lead to noise. To this end, we propose a code model COMMIT (Context-aware prOMpting based comMIt-message generaTion) in conjunction with a code dataset CODEC (COntext and metaData Enhanced Code dataset). Leveraging program slicing, CODEC consolidates code changes along with related contexts via property graph analysis. Further, utilizing CodeT5+ as the backbone model, we train COMMIT via context-aware prompt on CODEC. Experiments show that COMMIT can surpass all compared models including pre-trained language models for code (code-PLMs) such as CommitBART and large language models for code (code-LLMs) such as Code-LlaMa. Besides, we investigate several research questions (RQs), further verifying the effectiveness of our approach. We release the data and code at: https://github.com/Jnunlplab/COMMIT.git.
Generating high-quality responses is a key challenge for any open domain dialogue systems. However, even though there exist a variety of quality dimensions especially designed for dialogue evaluation (e.g., coherence and diversity scores), current dialogue systems rarely utilize them to guide the response generation during training. To alleviate this issue, we propose LSTDial (Long- and Short-Term Dialogue), a novel two-stage framework which generates and utilizes conversation evaluation as explicit feedback during training. Specifically, we fine-tune pre-trained dialogue systems through using turn-level quality feedback in the first stage and further train ever-improving dialogue agents through using dialogue-level quality feedback in the second stage. By using our approach on dialogue systems, capable of enabling dialogue generation with both short-term capabilities (generating more fluent, relevant and varied responses at the turn-level) and long-term capabilities (generating more coherent, engaging and informative responses at the dialogue-level). We implement LSTDial on four strong baseline models and experiment with two open-domain dialogue datasets. Experimental results show that LSTDial achieves significant improvement, enabling to generate better dialogue responses in terms of both human and automatic evaluation.
Evaluation metrics shine the light on the best models and thus strongly influence the research directions, such as the recently developed dialogue metrics USR, FED, and GRADE. However, most current metrics evaluate the dialogue data as isolated and static because they only focus on a single quality or several qualities. To mitigate the problem, this paper proposes an interpretable, multi-faceted, and controllable framework IM^2 (Interpretable and Multi-category Integrated Metric) to combine a large number of metrics which are good at measuring different qualities. The IM^2 framework first divides current popular dialogue qualities into different categories and then applies or proposes dialogue metrics to measure the qualities within each category and finally generates an overall IM^2 score. An initial version of IM^2 was submitted to the AAAI 2022 Track5.1@DSTC10 challenge and took the 2^nd place on both of the development and test leaderboard. After the competition, we develop more metrics and improve the performance of our model. We compare IM^2 with other 13 current dialogue metrics and experimental results show that IM^2 correlates more strongly with human judgments than any of them on each evaluated dataset.