Maria Eskevich

2024

Data-Envelopes for Cultural Heritage: Going beyond Datasheets
Mrinalini Luthra | Maria Eskevich
Proceedings of the Workshop on Legal and Ethical Issues in Human Language Technologies @ LREC-COLING 2024

Cultural heritage data is a rich source of information about the history and culture development in the past. When used with due understanding of its intrinsic complexity it can both support research in social sciences and humanities, and become input for machine learning and artificial intelligence algorithms. In all cases ethical and contextual considerations can be encouraged when the relevant information is provided in a clear and well structured form to potential users before they begin to interact with the data. Proposed data-envelopes, basing on the existing documentation frameworks, address the particular needs and challenges of the cultural heritage field while combining machine-readability and user-friendliness. We develop and test data-envelopes usability on the data from the Huygens Institute for History and Culture of the Netherlands. This paper presents the following contributions: i) we highlight the complexity of CH data, featuring the unique ethical and contextual considerations they entail; ii) we evaluate and compare existing dataset documentation frameworks, examining their suitability for CH datasets; iii) we introduce the “data-envelope”–a machine readable adaptation of existing dataset documentation frameworks, to tackle the specificities of CH datasets. Its modular form is designed to serve not only the needs of machine learning (ML), but also and especially broader user groups varying from humanities scholars, governmental monitoring authorities to citizen scientists and the general public. Importantly, the data-envelope framework emphasises the legal and ethical dimensions of dataset documentation, facilitating compliance with evolving data protection regulations and enhancing the accountability of data stewardship in the cultural heritage sector. We discuss and invite the readers for further conversation on the topic of ethical considerations, and how the different audiences should be informed about the importance of datasets documentation management and their context.

pdf bib

Proceedings of the IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora (ParlaCLARIN) @ LREC-COLING 2024
Darja Fiser | Maria Eskevich | David Bordon
Proceedings of the IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora (ParlaCLARIN) @ LREC-COLING 2024

2022

pdf bib

Proceedings of the Workshop ParlaCLARIN III within the 13th Language Resources and Evaluation Conference
Darja Fišer | Maria Eskevich | Jakob Lenardič | Franciska de Jong
Proceedings of the Workshop ParlaCLARIN III within the 13th Language Resources and Evaluation Conference

2020

pdf bib abs

Assessing Human-Parity in Machine Translation on the Segment Level
Yvette Graham | Christian Federmann | Maria Eskevich | Barry Haddow
Findings of the Association for Computational Linguistics: EMNLP 2020

Recent machine translation shared tasks have shown top-performing systems to tie or in some cases even outperform human translation. Such conclusions about system and human performance are, however, based on estimates aggregated from scores collected over large test sets of translations and unfortunately leave some remaining questions unanswered. For instance, simply because a system significantly outperforms the human translator on average may not necessarily mean that it has done so for every translation in the test set. Firstly, are there remaining source segments present in evaluation test sets that cause significant challenges for top-performing systems and can such challenging segments go unnoticed due to the opacity of current human evaluation procedures? To provide insight into these issues we carefully inspect the outputs of top-performing systems in the most recent WMT-19 news translation shared task for all language pairs in which a system either tied or outperformed human translation. Our analysis provides a new method of identifying the remaining segments for which either machine or human perform poorly. For example, in our close inspection of WMT-19 English to German and German to English we discover the segments that disjointly proved a challenge for human and machine. For English to Russian, there were no segments included in our sample of translations that caused a significant challenge for the human translator, while we again identify the set of segments that caused issues for the top-performing system.

CLARIN is a European Research Infrastructure providing access to digital language resources and tools from across Europe and beyond to researchers in the humanities and social sciences. This paper focuses on CLARIN as a platform for the sharing of language resources. It zooms in on the service offer for the aggregation of language repositories and the value proposition for a number of communities that benefit from the enhanced visibility of their data and services as a result of integration in CLARIN. The enhanced findability of language resources is serving the social sciences and humanities (SSH) community at large and supports research communities that aim to collaborate based on virtual collections for a specific domain. The paper also addresses the wider landscape of service platforms based on language technologies which has the potential of becoming a powerful set of interoperable facilities to a variety of communities of use.

pdf bib

Proceedings of the Second ParlaCLARIN Workshop
Darja Fišer | Maria Eskevich | Franciska de Jong
Proceedings of the Second ParlaCLARIN Workshop

pdf bib abs

Podcasts are a large and growing repository of spoken audio. As an audio format, podcasts are more varied in style and production type than broadcast news, contain more genres than typically studied in video data, and are more varied in style and format than previous corpora of conversations. When transcribed with automatic speech recognition they represent a noisy but fascinating collection of documents which can be studied through the lens of natural language processing, information retrieval, and linguistics. Paired with the audio files, they are also a resource for speech processing and the study of paralinguistic, sociolinguistic, and acoustic aspects of the domain. We introduce the Spotify Podcast Dataset, a new corpus of 100,000 podcasts. We demonstrate the complexity of the domain with a case study of two tasks: (1) passage search and (2) summarization. This is orders of magnitude larger than previous speech corpora used for search and summarization. Our results show that the size and variability of this corpus opens up new avenues for research.

pdf bib

Proceedings of the Workshop about Language Resources for the SSH Cloud
Daan Broeder | Maria Eskevich | Monica Monachini
Proceedings of the Workshop about Language Resources for the SSH Cloud

pdf bib abs

LR4SSHOC: The Future of Language Resources in the Context of the Social Sciences and Humanities Open Cloud
Daan Broeder | Maria Eskevich | Monica Monachini
Proceedings of the Workshop about Language Resources for the SSH Cloud

This paper outlines the future of language resources and identifies their potential contribution for creating and sustaining the social sciences and humanities (SSH) component of the European Open Science Cloud (EOSC).

pdf bib abs

Social Sciences and Humanities Pathway Towards the European Open Science Cloud
Francesca Di Donato | Monica Monachini | Maria Eskevich | Stefanie Pohle | Yoann Moranville | Suzanne Dumouchel
Proceedings of the Workshop about Language Resources for the SSH Cloud

The paper presents a journey, which starts from various social sciences and humanities (SSH) Research Infrastructures in Europe and arrives at the comprehensive “ecosystem of infrastructures”, namely the European Open Science Cloud (EOSC). We will highlight how the SSH Open Science infrastructures contribute to the goal of establishing the EOSC. First, through the example of OPERAS, the European Research Infrastructure for Open Scholarly Communication in the SSH, to see how its services are conceived to be part of the EOSC and to address the communities’ needs. The next two sections highlight collaboration practices between partners in Europe to build the SSH component of the EOSC and a SSH discovery platform, as a service of OPERAS and the EOSC. The last two sections will focus on an implementation network dedicated to SSH data fairification.

2016

pdf bib abs

Is all that Glitters in Machine Translation Quality Estimation really Gold?
Yvette Graham | Timothy Baldwin | Meghan Dowling | Maria Eskevich | Teresa Lynn | Lamia Tounsi
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Human-targeted metrics provide a compromise between human evaluation of machine translation, where high inter-annotator agreement is difficult to achieve, and fully automatic metrics, such as BLEU or TER, that lack the validity of human assessment. Human-targeted translation edit rate (HTER) is by far the most widely employed human-targeted metric in machine translation, commonly employed, for example, as a gold standard in evaluation of quality estimation. Original experiments justifying the design of HTER, as opposed to other possible formulations, were limited to a small sample of translations and a single language pair, however, and this motivates our re-evaluation of a range of human-targeted metrics on a substantially larger scale. Results show significantly stronger correlation with human judgment for HBLEU over HTER for two of the nine language pairs we include and no significant difference between correlations achieved by HTER and HBLEU for the remaining language pairs. Finally, we evaluate a range of quality estimation systems employing HTER and direct assessment (DA) of translation adequacy as gold labels, resulting in a divergence in system rankings, and propose employment of DA for future quality estimation evaluations.

2012

pdf bib abs

Creating a Data Collection for Evaluating Rich Speech Retrieval
Maria Eskevich | Gareth J.F. Jones | Martha Larson | Roeland Ordelman
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We describe the development of a test collection for the investigation of speech retrieval beyond identification of relevant content. This collection focuses on satisfying user information needs for queries associated with specific types of speech acts. The collection is based on an archive of the Internet video from Internet video sharing platform (blip.tv), and was provided by the MediaEval benchmarking initiative. A crowdsourcing approach was used to identify segments in the video data which contain speech acts, to create a description of the video containing the act and to generate search queries designed to refind this speech act. We describe and reflect on our experiences with crowdsourcing this test collection using the Amazon Mechanical Turk platform. We highlight the challenges of constructing this dataset, including the selection of the data source, design of the crowdsouring task and the specification of queries and relevant items.

2009

pdf bib

Prominence detected by listeners for future speech synthesis application
Maria Eskevich
Proceedings of the 17th Nordic Conference of Computational Linguistics (NODALIDA 2009)

Maria Eskevich

2024

2022

2020

2016

2012

2009

Co-authors

Venues