Advaith Siddharthan


2021

pdf bib
Summarising Historical Text in Modern Languages
Xutan Peng | Yi Zheng | Chenghua Lin | Advaith Siddharthan
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

We introduce the task of historical text summarisation, where documents in historical forms of a language are summarised in the corresponding modern language. This is a fundamentally important routine to historians and digital humanities researchers but has never been automated. We compile a high-quality gold-standard text summarisation dataset, which consists of historical German and Chinese news from hundreds of years ago summarised in modern German or Chinese. Based on cross-lingual transfer learning techniques, we propose a summarisation model that can be trained even with no cross-lingual (historical to modern) parallel data, and further benchmark it against state-of-the-art algorithms. We report automatic and human evaluations that distinguish the historic to modern language summarisation task from standard cross-lingual summarisation (i.e., modern to modern language), highlight the distinctness and value of our dataset, and demonstrate that our transfer learning approach outperforms standard cross-lingual benchmarks on this task.

2018

pdf bib
Generating Summaries of Sets of Consumer Products: Learning from Experiments
Kittipitch Kuptavanich | Ehud Reiter | Kees Van Deemter | Advaith Siddharthan
Proceedings of the 11th International Conference on Natural Language Generation

We explored the task of creating a textual summary describing a large set of objects characterised by a small number of features using an e-commerce dataset. When a set of consumer products is large and varied, it can be difficult for a consumer to understand how the products in the set differ; consequently, it can be challenging to choose the most suitable product from the set. To assist consumers, we generated high-level summaries of product sets. Two generation algorithms are presented, discussed, and evaluated with human users. Our evaluation results suggest a positive contribution to consumers’ understanding of the domain.

2016

pdf bib
Scrutable Feature Sets for Stance Classification
Angrosh Mandya | Advaith Siddharthan | Adam Wyner
Proceedings of the Third Workshop on Argument Mining (ArgMining2016)

pdf bib
Summarising the points made in online political debates
Charlie Egan | Advaith Siddharthan | Adam Wyner
Proceedings of the Third Workshop on Argument Mining (ArgMining2016)

pdf bib
Summarising News Stories for Children
Iain Macdonald | Advaith Siddharthan
Proceedings of the 9th International Natural Language Generation conference

2015

pdf bib
Creating Textual Driver Feedback from Telemetric Data
Daniel Braun | Ehud Reiter | Advaith Siddharthan
Proceedings of the 15th European Workshop on Natural Language Generation (ENLG)

2014

pdf bib
Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR)
Sandra Williams | Advaith Siddharthan | Ani Nenkova
Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR)

pdf bib
Text simplification using synchronous dependency grammars: Generalising automatically harvested rules
Mandya Angrosh | Advaith Siddharthan
Proceedings of the 8th International Natural Language Generation Conference (INLG)

pdf bib
Lexico-syntactic text simplification and compression with typed dependencies
Mandya Angrosh | Tadashi Nomoto | Advaith Siddharthan
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Hybrid text simplification using synchronous dependency grammars with hand-written and automatically harvested rules
Advaith Siddharthan | Angrosh Mandya
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

2013

pdf bib
Tag2Blog: Narrative Generation from Satellite Tag Data
Kapila Ponnamperuma | Advaith Siddharthan | Cheng Zeng | Chris Mellish | René van der Wal
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations

pdf bib
Proceedings of the Second Workshop on Predicting and Improving Text Readability for Target Reader Populations
Sandra Williams | Advaith Siddharthan | Ani Nenkova
Proceedings of the Second Workshop on Predicting and Improving Text Readability for Target Reader Populations

2012

pdf bib
Blogging birds: Generating narratives about reintroduced species to promote public engagement
Advaith Siddharthan | Matthew Green | Kees van Deemter | Chris Mellish | René van der Wal
INLG 2012 Proceedings of the Seventh International Natural Language Generation Conference

pdf bib
Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations
Sandra Williams | Advaith Siddharthan | Ani Nenkova
Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations

pdf bib
Offline Sentence Processing Measures for testing Readability with Users
Advaith Siddharthan | Napoleon Katsos
Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations

pdf bib
Natural Language Generation for Nature Conservation: Automating Feedback to Help Volunteers Identify Bumblebee Species
Steven Blake | Advaith Siddharthan | Hien Nguyen | Nirwan Sharma | Anne-Marie Robinson | Elaine O’Mahony | Ben Darvill | Chris Mellish | René van der Wal
Proceedings of COLING 2012

2011

pdf bib
Text Simplification using Typed Dependencies: A Comparision of the Robustness of Different Generation Strategies
Advaith Siddharthan
Proceedings of the 13th European Workshop on Natural Language Generation

pdf bib
Investigation into Human Preference between Common and Unambiguous Lexical Substitutions
Andrew Walker | Advaith Siddharthan | Andrew Starkey
Proceedings of the 13th European Workshop on Natural Language Generation

pdf bib
Information Status Distinctions and Referring Expressions: An Empirical Study of References to People in News Summaries
Advaith Siddharthan | Ani Nenkova | Kathleen McKeown
Computational Linguistics, Volume 37, Issue 4 - December 2011

2010

pdf bib
Reformulating Discourse Connectives for Non-Expert Readers
Advaith Siddharthan | Napoleon Katsos
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Camtology: Intelligent Information Access for Science
Ted Briscoe | Karl Harrison | Andrew Naish-Guzman | Andy Parker | Advaith Siddharthan | David Sinclair | Mark Slater | Rebecca Watson
Proceedings of the NAACL HLT 2010 Demonstration Session

pdf bib
Complex Lexico-syntactic Reformulation of Sentences Using Typed Dependency Representations
Advaith Siddharthan
Proceedings of the 6th International Natural Language Generation Conference

pdf bib
Corpora for the Conceptualisation and Zoning of Scientific Papers
Maria Liakata | Simone Teufel | Advaith Siddharthan | Colin Batchelor
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We present two complementary annotation schemes for sentence based annotation of full scientific papers, CoreSC and AZ-II, applied to primary research articles in chemistry. AZ-II is the extension of AZ for chemistry papers. AZ has been shown to have been reliably annotated by independent human coders and useful for various information access tasks. Like AZ, AZ-II follows the rhetorical structure of a scientific paper and the knowledge claims made by the authors. The CoreSC scheme takes a different view of scientific papers, treating them as the humanly readable representations of scientific investigations. It seeks to retrieve the structure of the investigation from the paper as generic high-level Core Scientific Concepts (CoreSC). CoreSCs have been annotated by 16 chemistry experts over a total of 265 full papers in physical chemistry and biochemistry. We describe the differences and similarities between the two schemes in detail and present the two corpora produced using each scheme. There are 36 shared papers in the corpora, which allows us to quantitatively compare aspects of the annotation schemes. We show the correlation between the two schemes, their strengths and weeknesses and discuss the benefits of combining a rhetorical based analysis of the papers with a content-based one.

2009

pdf bib
Towards Domain-Independent Argumentative Zoning: Evidence from Chemistry and Computational Linguistics
Simone Teufel | Advaith Siddharthan | Colin Batchelor
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

2008

pdf bib
Generating Research Websites Using Summarisation Techniques
Advaith Siddharthan | Ann Copestake
Proceedings of the ACL-08: HLT Demo Session

pdf bib
Language Resources and Chemical Informatics
C.J. Rupp | Ann Copestake | Peter Corbett | Peter Murray-Rust | Advaith Siddharthan | Simone Teufel | Benjamin Waldron
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Chemistry research papers are a primary source of information about chemistry, as in any scientific field. The presentation of the data is, predominantly, unstructured information, and so not immediately susceptible to processes developed within chemical informatics for carrying out chemistry research by information processing techniques. At one level, extracting the relevant information from research papers is a text mining task, requiring both extensive language resources and specialised knowledge of the subject domain. However, the papers also encode information about the way the research is conducted and the structure of the field itself. Applying language technology to research papers in chemistry can facilitate eScience on several different levels. The SciBorg project sets out to provide an extensive, analysed corpus of published chemistry research. This relies on the cooperation of several journal publishers to provide papers in an appropriate form. The work is carried out as a collaboration involving the Computer Laboratory, Chemistry Department and eScience Centre at Cambridge University, and is funded under the UK eScience programme.

2007

pdf bib
Whose Idea Was This, and Why Does it Matter? Attributing Scientific Work to Citations
Advaith Siddharthan | Simone Teufel
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

pdf bib
Evaluating an open-domain GRE algorithm on closed domains system IDs: CAM-B, CAM-T, CAM-BU and CAM-TU
Advaith Siddharthan | Ann Copestake
Proceedings of the Workshop on Using corpora for natural language generation

2006

pdf bib
Parallel Syntactic Annotation of Multiple Languages
Owen Rambow | Bonnie Dorr | David Farwell | Rebecca Green | Nizar Habash | Stephen Helmreich | Eduard Hovy | Lori Levin | Keith J. Miller | Teruko Mitamura | Florence Reeder | Advaith Siddharthan
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper describes an effort to investigate the incrementally deepening development of an interlingua notation, validated by human annotation of texts in English plus six languages. We begin with deep syntactic annotation, and in this paper present a series of annotation manuals for six different languages at the deep-syntactic level of representation. Many syntactic differences between languages are removed in the proposed syntactic annotation, making them useful resources for multilingual NLP projects with semantic components.

pdf bib
An annotation scheme for citation function
Simone Teufel | Advaith Siddharthan | Dan Tidhar
Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue

pdf bib
Automatic classification of citation function
Simone Teufel | Advaith Siddharthan | Dan Tidhar
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

2005

pdf bib
Improving Multilingual Summarization: Using Redundancy in the Input to Correct MT errors
Advaith Siddharthan | Kathleen McKeown
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
Automatically Learning Cognitive Status for Multi-Document Summarization of Newswire
Ani Nenkova | Advaith Siddharthan | Kathleen McKeown
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

2004

pdf bib
Generating Referring Expressions in Open Domains
Advaith Siddharthan | Ann Copestake
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

pdf bib
Interlingual Annotation of Multilingual Text Corpora
Stephen Helmreich | David Farwell | Bonnie Dorr | Nizar Habash | Lori Levin | Teruko Mitamura | Florence Reeder | Keith Miller | Eduard Hovy | Owen Rambow | Advaith Siddharthan
Proceedings of the Workshop Frontiers in Corpus Annotation at HLT-NAACL 2004

pdf bib
Syntactic Simplification for Improving Content Selection in Multi-Document Summarization
Advaith Siddharthan | Ani Nenkova | Kathleen McKeown
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
Interlingual annotation for MT development
Florence Reeder | Bonnie Dorr | David Farwell | Nizar Habash | Stephen Helmreich | Eduard Hovy | Lori Levin | Teruko Mitamura | Keith Miller | Owen Rambow | Advaith Siddharthan
Proceedings of the 6th Conference of the Association for Machine Translation in the Americas: Technical Papers

MT systems that use only superficial representations, including the current generation of statistical MT systems, have been successful and useful. However, they will experience a plateau in quality, much like other “silver bullet” approaches to MT. We pursue work on the development of interlingual representations for use in symbolic or hybrid MT systems. In this paper, we describe the creation of an interlingua and the development of a corpus of semantically annotated text, to be validated in six languages and evaluated in several ways. We have established a distributed, well-functioning research methodology, designed a preliminary interlingua notation, created annotation manuals and tools, developed a test collection in six languages with associated English translations, annotated some 150 translations, and designed and applied various annotation metrics. We describe the data sets being annotated and the interlingual (IL) representation language which uses two ontologies and a systematic theta-role list. We present the annotation tools built and outline the annotation process. Following this, we describe our evaluation methodology and conclude with a summary of issues that have arisen.

2003

pdf bib
Preserving Discourse Structure when Simplifying Text
Advaith Siddharthan
Proceedings of the 9th European Workshop on Natural Language Generation (ENLG-2003) at EACL 2003

pdf bib
Resolving Pronouns Robustly: Plumbing the Depths of Shallowness
Advaith Siddharthan
Proceedings of the 2003 EACL Workshop on The Computational Treatment of Anaphora