Berfin Aktaş

Also published as: Berfin Aktas

2025

Clarifying Underspecified Discourse Relations in Instructional Texts
Berfin Aktas | Michael Roth
Findings of the Association for Computational Linguistics: ACL 2025

Discourse relations contribute to the structure of a text and can optionally be realized through explicit connectives such as “but” and “while”. But when are these connectives necessary to avoid possible misunderstandings? We investigate this question by first building a corpus of 4,274 text revisions in each of which a connective was explicitly inserted. For a subset of 250 cases, we collect plausibility annotations on other connectives to check whether they would represent suitable alternative relations. The results of this annotation show that several relations are often perceived as plausible in our data. Furthermore, we analyze the extent to which large language models can identify instances with multiple plausible relations as a possible source of misunderstandings. We find that the models predict plausibility of individual connectives with up to 66% accuracy, but they are not reliable in estimating when multiple relations are plausible.

pdf bib abs

Information-Theoretic and Prompt-Based Evaluation of Discourse Connective Edits in Instructional Text Revisions
Berfin Aktas | Michael Roth
Proceedings of the 6th Workshop on Computational Approaches to Discourse, Context and Document-Level Inferences (CODI 2025)

We present a dataset of text revisions involving the deletion or replacement of discourse connectives. Manual annotation of a replacement subset reveals that only 19% of edits were judged either necessary or should be left unchanged, with the rest appearing optional. Surprisal metrics from GPT-2 token probabilities and prompt-based predictions from GPT-4.1 correlate with these judgments, particularly in such clear cases.

2024

pdf bib abs

Shallow Discourse Parsing on Twitter Conversations
Berfin Aktas | Burak Özmen
Proceedings of the 20th Joint ACL - ISO Workshop on Interoperable Semantic Annotation @ LREC-COLING 2024

We present our PDTB-style annotations on conversational Twitter data, which was initially annotated by Scheffler et al. (2019). We introduced 1,043 new annotations to the dataset, nearly doubling the number of previously annotated discourse relations. Subsequently, we applied a neural Shallow Discourse Parsing (SDP) model to the resulting corpus, improving its performance through retraining with in-domain data. The most substantial improvement was observed in the sense identification task (+19%). Our experiments with diverse training data combinations underline the potential benefits of exploring various data combinations in domain adaptation efforts for SDP. To the best of our knowledge, this is the first application of Shallow Discourse Parsing on Twitter data

2020

pdf bib abs

TwiConv: A Coreference-annotated Corpus of Twitter Conversations
Berfin Aktaş | Annalena Kohnert
Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference

This article introduces TwiConv, an English coreference-annotated corpus of microblog conversations from Twitter. We describe the corpus compilation process and the annotation scheme, and release the corpus publicly, along with this paper. We manually annotated nominal coreference in 1756 tweets arranged in 185 conversation threads. The annotation achieves satisfactory annotation agreement results. We also present a new method for mapping the tweet contents with distributed stand-off annotations, which can easily be adapted to different annotation tasks.

pdf bib abs

Variation in Coreference Strategies across Genres and Production Media
Berfin Aktaş | Manfred Stede
Proceedings of the 28th International Conference on Computational Linguistics

In response to (i) inconclusive results in the literature as to the properties of coreference chains in written versus spoken language, and (ii) a general lack of work on automatic coreference resolution on both spoken language and social media, we undertake a corpus study involving the various genre sections of Ontonotes, the Switchboard corpus, and a corpus of Twitter conversations. Using a set of measures that previously have been applied individually to different data sets, we find fairly clear patterns of “behavior” for the different genres/media. Besides their role for psycholinguistic investigation (why do we employ different coreference strategies when we write or speak) and for the placement of Twitter in the spoken–written continuum, we see our results as a contribution to approaching genre-/media-specific coreference resolution.

pdf bib abs

Adapting Coreference Resolution to Twitter Conversations
Berfin Aktaş | Veronika Solopova | Annalena Kohnert | Manfred Stede
Findings of the Association for Computational Linguistics: EMNLP 2020

The performance of standard coreference resolution is known to drop significantly on Twitter texts. We improve the performance of the (Lee et al., 2018) system, which is originally trained on OntoNotes, by retraining on manually-annotated Twitter conversation data. Further experiments by combining different portions of OntoNotes with Twitter data show that selecting text genres for the training data can beat the mere maximization of training data amount. In addition, we inspect several phenomena such as the role of deictic pronouns in conversational data, and present additional results for variant settings. Our best configuration improves the performance of the”out of the box” system by 21.6%.

2019

pdf bib abs

Annotating Shallow Discourse Relations in Twitter Conversations
Tatjana Scheffler | Berfin Aktaş | Debopam Das | Manfred Stede
Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019

We introduce our pilot study applying PDTB-style annotation to Twitter conversations. Lexically grounded coherence annotation for Twitter threads will enable detailed investigations of the discourse structure of conversations on social media. Here, we present our corpus of 185 threads and annotation, including an inter-annotator agreement study. We discuss our observations as to how Twitter discourses differ from written news text wrt. discourse connectives and relations. We confirm our hypothesis that discourse relations in written social media conversations are expressed differently than in (news) text. We find that in Twitter, connective arguments frequently are not full syntactic clauses, and that a few general connectives expressing EXPANSION and CONTINGENCY make up the majority of the explicit relations in our data.

2018

pdf bib abs

Anaphora Resolution for Twitter Conversations: An Exploratory Study
Berfin Aktaş | Tatjana Scheffler | Manfred Stede
Proceedings of the First Workshop on Computational Models of Reference, Anaphora and Coreference

We present a corpus study of pronominal anaphora on Twitter conversations. After outlining the specific features of this genre, with respect to reference resolution, we explain the construction of our corpus and the annotation steps. From this we derive a list of phenomena that need to be considered when performing anaphora resolution on this type of data. Finally, we test the performance of an off-the-shelf resolution system, and provide some qualitative error analysis.

Berfin Aktaş

2025

2024

2020

2019

2018

2010

2009

Co-authors

Venues