2024
pdf
bib
abs
Socratic Human Feedback (SoHF): Expert Steering Strategies for LLM Code Generation
Subramanian Chidambaram
|
Li Erran Li
|
Min Bai
|
Xiaopeng Li
|
Kaixiang Lin
|
Xiong Zhou
|
Alex C. Williams
Findings of the Association for Computational Linguistics: EMNLP 2024
Large Language Models (LLMs) are increasingly used for generating code solutions, empowered by features like self-debugging and self-reflection. However, LLMs often struggle with complex programming problems without human guidance. This paper investigates the strategies employed by expert programmers to steer code-generating LLMs toward successful outcomes. Through a study involving experts using natural language to guide GPT-4, Gemini Ultra, and, Claude 3.5 Sonnet on highly difficult programming challenges, we frame our analysis using the “Socratic Feedback” paradigm for understanding effective steering strategies. By analyzing 30 conversational transcripts across all three models, we map observed feedback strategies to five stages of Socratic Questioning: Definition, Elenhus, Maieutic, Dialectic, and Counter-factual reasoning. We find evidence that by employing a combination of different Socratic feedback strategies across multiple turns, programmers successfully guided the models to solve 74% of the problems that the models initially failed to solve on their own.
2021
pdf
bib
abs
Semantic Aligned Multi-modal Transformer for Vision-LanguageUnderstanding: A Preliminary Study on Visual QA
Han Ding
|
Li Erran Li
|
Zhiting Hu
|
Yi Xu
|
Dilek Hakkani-Tur
|
Zheng Du
|
Belinda Zeng
Proceedings of the Third Workshop on Multimodal Artificial Intelligence
Recent vision-language understanding approaches adopt a multi-modal transformer pre-training and finetuning paradigm. Prior work learns representations of text tokens and visual features with cross-attention mechanisms and captures the alignment solely based on indirect signals. In this work, we propose to enhance the alignment mechanism by incorporating image scene graph structures as the bridge between the two modalities, and learning with new contrastive objectives. In our preliminary study on the challenging compositional visual question answering task, we show the proposed approach achieves improved results, demonstrating potentials to enhance vision-language understanding.
pdf
bib
abs
Retrieval, Analogy, and Composition: A framework for Compositional Generalization in Image Captioning
Zhan Shi
|
Hui Liu
|
Martin Renqiang Min
|
Christopher Malon
|
Li Erran Li
|
Xiaodan Zhu
Findings of the Association for Computational Linguistics: EMNLP 2021
Image captioning systems are expected to have the ability to combine individual concepts when describing scenes with concept combinations that are not observed during training. In spite of significant progress in image captioning with the help of the autoregressive generation framework, current approaches fail to generalize well to novel concept combinations. We propose a new framework that revolves around probing several similar image caption training instances (retrieval), performing analogical reasoning over relevant entities in retrieved prototypes (analogy), and enhancing the generation process with reasoning outcomes (composition). Our method augments the generation model by referring to the neighboring instances in the training set to produce novel concept combinations in generated captions. We perform experiments on the widely used image captioning benchmarks. The proposed models achieve substantial improvement over the compared baselines on both composition-related evaluation metrics and conventional image captioning metrics.