Sibel Ozer


2020

In this work, we present two new bilingual discourse connective lexicons, namely, for Turkish-English and European Portuguese-English created automatically using the existing discourse relation-aligned TED-MDB corpus. In their current form, the Pt-En lexicon includes 95 entries, whereas the Tr-En lexicon contains 133 entries. The lexicons constitute the first step of a larger project of developing a multilingual discourse connective lexicon.

2019

This paper describes an automatic discourse relation alignment experiment as an empirical justification of the planned annotation projection approach to enlarge the 3600-word multilingual corpus of TED Multilingual Discourse Bank (TED-MDB). The experiment is carried out on a single language pair (English-Turkish) included in TED-MDB. The paper first describes the creation of a large corpus of English-Turkish bi-sentences, then it presents a sense-based experiment that automatically aligns the relations in the English sentences of TED-MDB with the Turkish sentences. The results are very close to the results obtained from an earlier semi-automatic post-annotation alignment experiment validated by human annotators and are encouraging for future annotation projection tasks.