Razvan Bunescu

Also published as: Razvan C. Bunescu


2023

pdf bib
Socratic Questioning of Novice Debuggers: A Benchmark Dataset and Preliminary Evaluations
Erfan Al-Hossami | Razvan Bunescu | Ryan Teehan | Laurel Powell | Khyati Mahajan | Mohsen Dorodchi
Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023)

Socratic questioning is a teaching strategy where the student is guided towards solving a problem on their own, instead of being given the solution directly. In this paper, we introduce a dataset of Socratic conversations where an instructor helps a novice programmer fix buggy solutions to simple computational problems. The dataset is then used for benchmarking the Socratic debugging abilities of GPT-based language models. While GPT-4 is observed to perform much better than GPT-3.5, its precision, and recall still fall short of human expert abilities, motivating further work in this area.

2022

pdf bib
Towards Autoformalization of Mathematics and Code Correctness: Experiments with Elementary Proofs
Garett Cunningham | Razvan Bunescu | David Juedes
Proceedings of the 1st Workshop on Mathematical Natural Language Processing (MathNLP)

The ever-growing complexity of mathematical proofs makes their manual verification by mathematicians very cognitively demanding. Autoformalization seeks to address this by translating proofs written in natural language into a formal representation that is computer-verifiable via interactive theorem provers. In this paper, we introduce a semantic parsing approach, based on the Universal Transformer architecture, that translates elementary mathematical proofs into an equivalent formalization in the language of the Coq interactive theorem prover. The same architecture is also trained to translate simple imperative code decorated with Hoare triples into formally verifiable proofs of correctness in Coq. Experiments on a limited domain of artificial and human-written proofs show that the models generalize well to intermediate lengths not seen during training and variations in natural language.

pdf bib
Distribution-Based Measures of Surprise for Creative Language: Experiments with Humor and Metaphor
Razvan C. Bunescu | Oseremen O. Uduehi
Proceedings of the 3rd Workshop on Figurative Language Processing (FLP)

Novelty or surprise is a fundamental attribute of creative output. As such, we postulate that a writer’s creative use of language leads to word choices and, more importantly, corresponding semantic structures that are unexpected for the reader. In this paper we investigate measures of surprise that rely solely on word distributions computed by language models and show empirically that creative language such as humor and metaphor is strongly correlated with surprise. Surprisingly at first, information content is observed to be at least as good a predictor of creative language as any of the surprise measures investigated. However, the best prediction performance is obtained when information and surprise measures are combined, showing that surprise measures capture an aspect of creative language that goes beyond information content.

2019

pdf bib
Context Dependent Semantic Parsing over Temporally Structured Data
Charles Chen | Razvan Bunescu
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We describe a new semantic parsing setting that allows users to query the system using both natural language questions and actions within a graphical user interface. Multiple time series belonging to an entity of interest are stored in a database and the user interacts with the system to obtain a better understanding of the entity’s state and behavior, entailing sequences of actions and questions whose answers may depend on previous factual or navigational interactions. We design an LSTM-based encoder-decoder architecture that models context dependency through copying mechanisms and multiple levels of attention over inputs and previous outputs. When trained to predict tokens using supervised learning, the proposed architecture substantially outperforms standard sequence generation baselines. Training the architecture using policy gradient leads to further improvements in performance, reaching a sequence-level accuracy of 88.7% on artificial data and 74.8% on real data.

2017

pdf bib
An Exploration of Data Augmentation and RNN Architectures for Question Ranking in Community Question Answering
Charles Chen | Razvan Bunescu
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

The automation of tasks in community question answering (cQA) is dominated by machine learning approaches, whose performance is often limited by the number of training examples. Starting from a neural sequence learning approach with attention, we explore the impact of two data augmentation techniques on question ranking performance: a method that swaps reference questions with their paraphrases, and training on examples automatically selected from external datasets. Both methods are shown to lead to substantial gains in accuracy over a strong baseline. Further improvements are obtained by changing the model architecture to mirror the structure seen in the data.

2013

pdf bib
Sense Clustering Using Wikipedia
Bharath Dandala | Chris Hokamp | Rada Mihalcea | Razvan Bunescu
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf bib
Multilingual Word Sense Disambiguation Using Wikipedia
Bharath Dandala | Rada Mihalcea | Razvan Bunescu
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Coarse to Fine Grained Sense Disambiguation in Wikipedia
Hui Shen | Razvan Bunescu | Rada Mihalcea
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity

2012

pdf bib
Sense and Reference Disambiguation in Wikipedia
Hui Shen | Razvan Bunescu | Rada Mihalcea
Proceedings of COLING 2012: Posters

pdf bib
Adaptive Clustering for Coreference Resolution with Deterministic Rules and Web-Based Language Models
Razvan Bunescu
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib
Towards Building a Multilingual Semantic Network: Identifying Interlingual Links in Wikipedia
Bharath Dandala | Rada Mihalcea | Razvan Bunescu
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

2011

pdf bib
Learning to Grade Short Answer Questions using Semantic Similarity Measures and Dependency Graph Alignments
Michael Mohler | Razvan Bunescu | Rada Mihalcea
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Learning the Relative Usefulness of Questions in Community QA
Razvan Bunescu | Yunfeng Huang
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
A Utility-Driven Approach to Question Ranking in Social QA
Razvan Bunescu | Yunfeng Huang
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

2008

pdf bib
Learning with Probabilistic Features for Improved Pipeline Models
Razvan Bunescu
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

2007

pdf bib
Learning to Extract Relations from the Web using Minimal Supervision
Razvan Bunescu | Raymond Mooney
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

2006

pdf bib
Using Encyclopedic Knowledge for Named entity Disambiguation
Razvan Bunescu | Marius Paşca
11th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Integrating Co-occurrence Statistics with Information Extraction for Robust Retrieval of Protein Interactions from Medline
Razvan Bunescu | Raymond Mooney | Arun Ramani | Edward Marcotte
Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology

2005

pdf bib
Using Biomedical Literature Mining to Consolidate the Set of Known Human Protein-Protein Interactions
Arun Ramani | Razvan Bunescu | Raymond Mooney | Edward Marcotte
Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics

pdf bib
A Shortest Path Dependency Kernel for Relation Extraction
Razvan Bunescu | Raymond Mooney
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

2004

pdf bib
Collective Information Extraction with Relational Markov Networks
Razvan Bunescu | Raymond Mooney
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

2003

pdf bib
Associative Anaphora Resolution: A Web-Based Approach
Razvan Bunescu
Proceedings of the 2003 EACL Workshop on The Computational Treatment of Anaphora

2001

pdf bib
Text and Knowledge Mining for Coreference Resolution
Sanda M. Harabagiu | Razvan C. Bunescu | Steven J. Maiorano
Second Meeting of the North American Chapter of the Association for Computational Linguistics