Ann Clifton
2025
Transforming Podcast Preview Generation: From Expert Models to LLM-Based Systems
Winstead Zhu | Ann Clifton | Azin Ghazimatin | Edgar Tanaka | Ward Ronan
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)
Winstead Zhu | Ann Clifton | Azin Ghazimatin | Edgar Tanaka | Ward Ronan
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)
Discovering and evaluating long-form talk content such as videos and podcasts poses a significant challenge for users, as it requires a considerable time investment. Previews offer a practical solution by providing concise snippets that showcase key moments of the content, enabling users to make more informed and confident choices. We propose an LLM-based approach for generating podcast episode previews and deploy the solution at scale, serving hundreds of thousands of podcast previews in a real-world application. Comprehensive offline evaluations and online A/B testing demonstrate that LLM-generated previews consistently outperform a strong baseline built on top of various ML expert models, showcasing a significant reduction in the need for meticulous feature engineering. The offline results indicate notable enhancements in understandability, contextual clarity, and interest level, and the online A/B test shows a 4.6% increase in user engagement with preview content, along with a 5x boost in processing efficiency, offering a more streamlined and performant solution compared to the strong baseline of feature-engineered expert models.
2020
100,000 Podcasts: A Spoken English Document Corpus
Ann Clifton | Sravana Reddy | Yongze Yu | Aasish Pappu | Rezvaneh Rezapour | Hamed Bonab | Maria Eskevich | Gareth Jones | Jussi Karlgren | Ben Carterette | Rosie Jones
Proceedings of the 28th International Conference on Computational Linguistics
Ann Clifton | Sravana Reddy | Yongze Yu | Aasish Pappu | Rezvaneh Rezapour | Hamed Bonab | Maria Eskevich | Gareth Jones | Jussi Karlgren | Ben Carterette | Rosie Jones
Proceedings of the 28th International Conference on Computational Linguistics
Podcasts are a large and growing repository of spoken audio. As an audio format, podcasts are more varied in style and production type than broadcast news, contain more genres than typically studied in video data, and are more varied in style and format than previous corpora of conversations. When transcribed with automatic speech recognition they represent a noisy but fascinating collection of documents which can be studied through the lens of natural language processing, information retrieval, and linguistics. Paired with the audio files, they are also a resource for speech processing and the study of paralinguistic, sociolinguistic, and acoustic aspects of the domain. We introduce the Spotify Podcast Dataset, a new corpus of 100,000 podcasts. We demonstrate the complexity of the domain with a case study of two tasks: (1) passage search and (2) summarization. This is orders of magnitude larger than previous speech corpora used for search and summarization. Our results show that the size and variability of this corpus opens up new avenues for research.
Proceedings of the 28th International Conference on Computational Linguistics: Industry Track
Ann Clifton | Courtney Napoles
Proceedings of the 28th International Conference on Computational Linguistics: Industry Track
Ann Clifton | Courtney Napoles
Proceedings of the 28th International Conference on Computational Linguistics: Industry Track
2018
The Sockeye Neural Machine Translation Toolkit at AMTA 2018
Felix Hieber | Tobias Domhan | Michael Denkowski | David Vilar | Artem Sokolov | Ann Clifton | Matt Post
Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)
Felix Hieber | Tobias Domhan | Michael Denkowski | David Vilar | Artem Sokolov | Ann Clifton | Matt Post
Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)
Leveraging Data Resources for Cross-Linguistic Information Retrieval Using Statistical Machine Translation
Steve Sloto | Ann Clifton | Greg Hanneman | Patrick Porter | Donna Gates | Almut Hildebrand | Anish Kumar
Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 2: User Track)
Steve Sloto | Ann Clifton | Greg Hanneman | Patrick Porter | Donna Gates | Almut Hildebrand | Anish Kumar
Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 2: User Track)
2013
An Online Algorithm for Learning over Constrained Latent Representations using Multiple Views
Ann Clifton | Max Whitney | Anoop Sarkar
Proceedings of the Sixth International Joint Conference on Natural Language Processing
Ann Clifton | Max Whitney | Anoop Sarkar
Proceedings of the Sixth International Joint Conference on Natural Language Processing
2012
Kriya - The SFU System for Translation Task at WMT-12
Majid Razmara | Baskaran Sankaran | Ann Clifton | Anoop Sarkar
Proceedings of the Seventh Workshop on Statistical Machine Translation
Majid Razmara | Baskaran Sankaran | Ann Clifton | Anoop Sarkar
Proceedings of the Seventh Workshop on Statistical Machine Translation
2011
Search
Fix author
Co-authors
- Anoop Sarkar 3
- Hamed Bonab 1
- Ben Carterette 1
- Michael Denkowski 1
- Tobias Domhan 1
- Maria Eskevich 1
- Donna Gates 1
- Azin Ghazimatin 1
- Greg Hanneman 1
- Felix Hieber 1
- Almut Silja Hildebrand 1
- Gareth Jones 1
- Rosie Jones 1
- Jussi Karlgren 1
- Anish Kumar 1
- Courtney Napoles 1
- Aasish Pappu 1
- Patrick Porter 1
- Matt Post 1
- Majid Razmara 1
- Sravana Reddy 1
- Rezvaneh Rezapour 1
- Ward Ronan 1
- Baskaran Sankaran 1
- Steve Sloto 1
- Artem Sokolov 1
- Edgar Tanaka 1
- David Vilar 1
- Max Whitney 1
- Yongze Yu 1
- Winstead Zhu 1