Mariana S. C. Almeida
2024
Automated test generation to evaluate tool-augmented LLMs as conversational AI agents
Samuel Arcadinho | David Oliveira Aparicio | Mariana S. C. Almeida
Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP
Samuel Arcadinho | David Oliveira Aparicio | Mariana S. C. Almeida
Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP
Tool-augmented LLMs are a promising approach to create AI agents that can have realistic conversations, follow procedures, and call appropriate functions. However, evaluating them is challenging due to the diversity of possible conversations, and existing datasets focus only on single interactions and function-calling. We present a test generation pipeline to evaluate LLMs as conversational AI agents. Our framework uses LLMs to generate diverse tests grounded on user-defined procedures. For that, we use intermediate graphs to limit the LLM test generator’s tendency to hallucinate content that is not grounded on input procedures, and enforces high coverage of the possible conversations. Additionally, we put forward ALMITA, a manually curated dataset for evaluating AI agents in customer support, and use it to evaluate existing LLMs. Our results show that while tool-augmented LLMs perform well in single interactions, they often struggle to handle complete conversations. While our focus is on customer support, our test generation pipeline is general enough to evaluate different AI agents.
2021
Multilingual Email Zoning
Bruno Jardim | Ricardo Rei | Mariana S. C. Almeida
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
Bruno Jardim | Ricardo Rei | Mariana S. C. Almeida
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
The segmentation of emails into functional zones (also dubbed email zoning) is a relevant preprocessing step for most NLP tasks that deal with emails. However, despite the multilingual character of emails and their applications, previous literature regarding email zoning corpora and systems was developed essentially for English. In this paper, we analyse the existing email zoning corpora and propose a new multilingual benchmark composed of 625 emails in Portuguese, Spanish and French. Moreover, we introduce OKAPI, the first multilingual email segmentation model based on a language agnostic sentence encoder. Besides generalizing well for unseen languages, our model is competitive with current English benchmarks, and reached new state-of-the-art performances for domain adaptation tasks in English.
2017
The SUMMA Platform Prototype
Renars Liepins | Ulrich Germann | Guntis Barzdins | Alexandra Birch | Steve Renals | Susanne Weber | Peggy van der Kreeft | Hervé Bourlard | João Prieto | Ondřej Klejch | Peter Bell | Alexandros Lazaridis | Alfonso Mendes | Sebastian Riedel | Mariana S. C. Almeida | Pedro Balage | Shay B. Cohen | Tomasz Dwojak | Philip N. Garner | Andreas Giefer | Marcin Junczys-Dowmunt | Hina Imran | David Nogueira | Ahmed Ali | Sebastião Miranda | Andrei Popescu-Belis | Lesly Miculicich Werlen | Nikos Papasarantopoulos | Abiola Obamuyide | Clive Jones | Fahim Dalvi | Andreas Vlachos | Yang Wang | Sibo Tong | Rico Sennrich | Nikolaos Pappas | Shashi Narayan | Marco Damonte | Nadir Durrani | Sameer Khurana | Ahmed Abdelali | Hassan Sajjad | Stephan Vogel | David Sheppey | Chris Hernon | Jeff Mitchell
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics
Renars Liepins | Ulrich Germann | Guntis Barzdins | Alexandra Birch | Steve Renals | Susanne Weber | Peggy van der Kreeft | Hervé Bourlard | João Prieto | Ondřej Klejch | Peter Bell | Alexandros Lazaridis | Alfonso Mendes | Sebastian Riedel | Mariana S. C. Almeida | Pedro Balage | Shay B. Cohen | Tomasz Dwojak | Philip N. Garner | Andreas Giefer | Marcin Junczys-Dowmunt | Hina Imran | David Nogueira | Ahmed Ali | Sebastião Miranda | Andrei Popescu-Belis | Lesly Miculicich Werlen | Nikos Papasarantopoulos | Abiola Obamuyide | Clive Jones | Fahim Dalvi | Andreas Vlachos | Yang Wang | Sibo Tong | Rico Sennrich | Nikolaos Pappas | Shashi Narayan | Marco Damonte | Nadir Durrani | Sameer Khurana | Ahmed Abdelali | Hassan Sajjad | Stephan Vogel | David Sheppey | Chris Hernon | Jeff Mitchell
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics
We present the first prototype of the SUMMA Platform: an integrated platform for multilingual media monitoring. The platform contains a rich suite of low-level and high-level natural language processing technologies: automatic speech recognition of broadcast media, machine translation, automated tagging and classification of named entities, semantic parsing to detect relationships between entities, and automatic construction / augmentation of factual knowledge bases. Implemented on the Docker platform, it can easily be deployed, customised, and scaled to large volumes of incoming media streams.
2016
Jointly Learning to Embed and Predict with Multiple Languages
Daniel C. Ferreira | André F. T. Martins | Mariana S. C. Almeida
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Daniel C. Ferreira | André F. T. Martins | Mariana S. C. Almeida
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
2015
Lisbon: Evaluating TurboSemanticParser on Multiple Languages and Out-of-Domain Data
Mariana S. C. Almeida | André F. T. Martins
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)
Mariana S. C. Almeida | André F. T. Martins
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)
Aligning Opinions: Cross-Lingual Opinion Mining with Dependencies
Mariana S. C. Almeida | Cláudia Pinto | Helena Figueira | Pedro Mendes | André F. T. Martins
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Mariana S. C. Almeida | Cláudia Pinto | Helena Figueira | Pedro Mendes | André F. T. Martins
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
2014
Priberam Compressive Summarization Corpus: A New Multi-Document Summarization Corpus for European Portuguese
Miguel B. Almeida | Mariana S. C. Almeida | André F. T. Martins | Helena Figueira | Pedro Mendes | Cláudia Pinto
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Miguel B. Almeida | Mariana S. C. Almeida | André F. T. Martins | Helena Figueira | Pedro Mendes | Cláudia Pinto
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
In this paper, we introduce the Priberam Compressive Summarization Corpus, a new multi-document summarization corpus for European Portuguese. The corpus follows the format of the summarization corpora for English in recent DUC and TAC conferences. It contains 80 manually chosen topics referring to events occurred between 2010 and 2013. Each topic contains 10 news stories from major Portuguese newspapers, radio and TV stations, along with two human generated summaries up to 100 words. Apart from the language, one important difference from the DUC/TAC setup is that the human summaries in our corpus are compressive: the annotators performed only sentence and word deletion operations, as opposed to generating summaries from scratch. We use this corpus to train and evaluate learning-based extractive and compressive summarization systems, providing an empirical comparison between these two approaches. The corpus is made freely available in order to facilitate research on automatic summarization.
Search
Fix author
Co-authors
- André F. T. Martins 6
- Miguel B. Almeida 2
- Helena Figueira 2
- Pedro Mendes 2
- Cláudia Pinto 2
- Ahmed Abdelali 1
- Ahmed Ali 1
- David Oliveira Aparicio 1
- Samuel Arcadinho 1
- Pedro Balage Filho 1
- Guntis Barzdins 1
- Peter Bell 1
- Alexandra Birch 1
- Hervé Bourlard 1
- Shay B. Cohen 1
- Fahim Dalvi 1
- Marco Damonte 1
- Nadir Durrani 1
- Tomasz Dwojak 1
- Daniel C. Ferreira 1
- Philip N. Garner 1
- Ulrich Germann 1
- Andreas Giefer 1
- Chris Hernon 1
- Hina Imran 1
- Bruno Jardim 1
- Clive Jones 1
- Marcin Junczys-Dowmunt 1
- Sameer Khurana 1
- Ondřej Klejch 1
- Alexandros Lazaridis 1
- Renārs Liepins 1
- Alfonso Mendes 1
- Lesly Miculicich Werlen 1
- Sebastião Miranda 1
- Jeff Mitchell 1
- Shashi Narayan 1
- David Nogueira 1
- Abiola Obamuyide 1
- Nikos Papasarantopoulos 1
- Nikolaos Pappas 1
- Andrei Popescu-Belis 1
- João Prieto 1
- Ricardo Rei 1
- Steve Renals 1
- Sebastian Riedel 1
- Hassan Sajjad 1
- Rico Sennrich 1
- David Sheppey 1
- Sibo Tong 1
- Andreas Vlachos 1
- Stephan Vogel 1
- Yang Wang 1
- Susanne Weber 1
- Peggy van der Kreeft 1