Necva Bölücü

Also published as: Necva Bolucu


2023

pdf bib
impact of sample selection on in-context learning for entity extraction from scientific writing
Necva Bölücü | Maciej Rybinski | Stephen Wan
Findings of the Association for Computational Linguistics: EMNLP 2023

Prompt-based usage of Large Language Models (LLMs) is an increasingly popular way to tackle many well-known natural language problems. This trend is due, in part, to the appeal of the In-Context Learning (ICL) prompt set-up, in which a few selected training examples are provided along with the inference request. ICL, a type of few-shot learning, is especially attractive for natural language processing (NLP) tasks defined for specialised domains, such as entity extraction from scientific documents, where the annotation is very costly due to expertise requirements for the annotators. In this paper, we present a comprehensive analysis of in-context sample selection methods for entity extraction from scientific documents using GPT-3.5 and compare these results against a fully supervised transformer-based baseline. Our results indicate that the effectiveness of the in-context sample selection methods is heavily domain-dependent, but the improvements are more notable for problems with a larger number of entity types. More in-depth analysis shows that ICL is more effective for low-resource set-ups of scientific information extraction

pdf bib
Which Sentence Representation is More Informative: An Analysis on Text Classification
Necva Bölücü | Burcu Can
Proceedings of the Seventh International Conference on Dependency Linguistics (Depling, GURT/SyntaxFest 2023)

Text classification is a popular and well-studied problem in Natural Language Processing. Most previous work on text classification has focused on deep neural networks such as LSTMs and CNNs. However, text classification studies using syntactic and semantic information are very limited in the literature. In this study, we propose a model using Graph Attention Network (GAT) that incorporates semantic and syntactic information as input for the text classification task. The semantic representations of UCCA and AMR are used as semantic information and the dependency tree is used as syntactic information. Extensive experimental results and in-depth analysis show that UCCA-GAT model, which is a semantic-aware model outperforms the AMR-GAT and DEP-GAT, which are semantic and syntax-aware models respectively. We also provide a comprehensive analysis of the proposed model to understand the limitations of the representations for the problem.

pdf bib
Investigating the Impact of Syntax-Enriched Transformers on Quantity Extraction in Scientific Texts
Necva Bölücü | Maciej Rybinski | Stephen Wan
Proceedings of the Second Workshop on Information Extraction from Scientific Publications

pdf bib
Findings of WASSA 2023 Shared Task: Multi-Label and Multi-Class Emotion Classification on Code-Mixed Text Messages
Iqra Ameer | Necva Bölücü | Hua Xu | Ali Al Bataineh
Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis

We present the results of the WASSA 2023 Shared-Task 2: Emotion Classification on code-mixed text messages (Roman Urdu + English), which included two tracks for emotion classification: multi-label and multi-class. The participants were provided with a dataset of code-mixed SMS messages in English and Roman Urdu labeled with 12 emotions for both tracks. A total of 5 teams (19 team members) participated in the shared task. We summarized the methods, resources, and tools used by the participating teams. We also made the data freely available for further improvements to the task.

2022

pdf bib
Analysing Syntactic and Semantic Features in Pre-trained Language Models in a Fully Unsupervised Setting
Necva Bölücü | Burcu Can
Proceedings of the 19th International Conference on Natural Language Processing (ICON)

Transformer-based pre-trained language models (PLMs) have been used in all NLP tasks and resulted in a great success. This has led to the question of whether we can transfer this knowledge to syntactic or semantic parsing in a completely unsupervised setting. In this study, we leverage PLMs as a source of external knowledge to perform a fully unsupervised parser model for semantic, constituency and dependency parsing. We analyse the results for English, German, French, and Turkish to understand the impact of the PLMs on different languages for syntactic and semantic parsing. We visualize the attention layers and heads in PLMs for parsing to understand the information that can be learned throughout the layers and the attention heads in the PLMs both for different levels of parsing tasks. The results obtained from dependency, constituency, and semantic parsing are similar to each other, and the middle layers and the ones closer to the final layers have more syntactic and semantic information.

pdf bib
Turkish Universal Conceptual Cognitive Annotation
Necva Bölücü | Burcu Can
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Universal Conceptual Cognitive Annotation (UCCA) (Abend and Rappoport, 2013a) is a cross-lingual semantic annotation framework that provides an easy annotation without any requirement for linguistic background. UCCA-annotated datasets have been already released in English, French, and German. In this paper, we introduce the first UCCA-annotated Turkish dataset that currently involves 50 sentences obtained from the METU-Sabanci Turkish Treebank (Atalay et al., 2003; Oflazeret al., 2003). We followed a semi-automatic annotation approach, where an external semantic parser is utilised for an initial annotation of the dataset, which is partially accurate and requires refinement. We manually revised the annotations obtained from the semantic parser that are not in line with the UCCA rules that we defined for Turkish. We used the same external semantic parser for evaluation purposes and conducted experiments with both zero-shot and few-shot learning. While the parser cannot predict remote edges in zero-shot setting, using even a small subset of training data in few-shot setting increased the overall F-1 score including the remote edges. This is the initial version of the annotated dataset and we are currently extending the dataset. We will release the current Turkish UCCA annotation guideline along with the annotated dataset.

pdf bib
TurkishDelightNLP: A Neural Turkish NLP Toolkit
Huseyin Alecakir | Necva Bölücü | Burcu Can
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: System Demonstrations

We introduce a neural Turkish NLP toolkit called TurkishDelightNLP that performs computational linguistic analyses from morphological level to semantic level that involves tasks such as stemming, morphological segmentation, morphological tagging, part-of-speech tagging, dependency parsing, and semantic parsing, as well as high-level NLP tasks such as named entity recognition. We publicly share the open-source Turkish NLP toolkit through a web interface that allows an input text to be analysed in real-time, as well as the open source implementation of the components provided in the toolkit, an API, and several annotated datasets such as word similarity test set to evaluate word embeddings and UCCA-based semantic annotation in Turkish. This will be the first open-source Turkish NLP toolkit that involves a range of NLP tasks in all levels. We believe that it will be useful for other researchers in Turkish NLP and will be also beneficial for other high-level NLP tasks in Turkish.

pdf bib
Automatic Classification of Evidence Based Medicine Using Transformers
Necva Bolucu | Pinar Uskaner Hepsag
Proceedings of the 20th Annual Workshop of the Australasian Language Technology Association