Nadia Polikarpova
2024
Solving Data-centric Tasks using Large Language Models
Shraddha Barke
|
Christian Poelitz
|
Carina Negreanu
|
Benjamin Zorn
|
José Cambronero
|
Andrew Gordon
|
Vu Le
|
Elnaz Nouri
|
Nadia Polikarpova
|
Advait Sarkar
|
Brian Slininger
|
Neil Toronto
|
Jack Williams
Findings of the Association for Computational Linguistics: NAACL 2024
Large language models are rapidly replacing help forums like StackOverflow, and are especially helpful to non-professional programmers and end users. These users are often interested in data-centric tasks, like spreadsheet manipulation and data wrangling, which are hard to solve if the intent is only communicated using a natural-language description, without including data. But how do we decide how much data and which data to include in the prompt?This paper makes two contributions towards answering this question. First, we create a dataset of real-world NL-to-code tasks manipulating tabular data, mined from StackOverflow posts. Second, we introduce a novel cluster-then-select prompting technique, which adds the most representative rows from the input data to the LLM prompt. Our experiments show that LLM performance is indeed sensitive to the amount of data passed in the prompt, and that for tasks with a lot of syntactic variation in the input table,our cluster-then-select technique outperforms a random selection baseline.
2019
Constraint-based Learning of Phonological Processes
Shraddha Barke
|
Rose Kunkel
|
Nadia Polikarpova
|
Eric Meinhardt
|
Eric Bakovic
|
Leon Bergen
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Phonological processes are context-dependent sound changes in natural languages. We present an unsupervised approach to learning human-readable descriptions of phonological processes from collections of related utterances. Our approach builds upon a technique from the programming languages community called *constraint-based program synthesis*. We contribute a novel encoding of the learning problem into Boolean Satisfiability constraints, which enables both data efficiency and fast inference. We evaluate our system on textbook phonology problems and datasets from the literature, and show that it achieves high accuracy at interactive speeds.
Search
Co-authors
- Shraddha Barke 2
- Christian Poelitz 1
- Carina Negreanu 1
- Benjamin Zorn 1
- José Cambronero 1
- show all...