Matthias Grabmair


pdf bib
Extractive Summarization of Legal Decisions using Multi-task Learning and Maximal Marginal Relevance
Abhishek Agarwal | Shanshan Xu | Matthias Grabmair
Findings of the Association for Computational Linguistics: EMNLP 2022

Summarizing legal decisions requires the expertise of law practitioners, which is both time- and cost-intensive. This paper presents techniques for extractive summarization of legal decisions in a low-resource setting using limited expert annotated data. We test a set of models that locate relevant content using a sequential model and tackle redundancy by leveraging maximal marginal relevance to compose summaries. We also demonstrate an implicit approach to help train our proposed models generate more informative summaries. Our multi-task learning model variant leverages rhetorical role identification as an auxiliary task to further improve the summarizer. We perform extensive experiments on datasets containing legal decisions from the US Board of Veterans’ Appeals and conduct quantitative and expert-ranked evaluations of our models. Our results show that the proposed approaches can achieve ROUGE scores vis-à-vis expert extracted summaries that match those achieved by inter-annotator comparison.

pdf bib
Deconfounding Legal Judgment Prediction for European Court of Human Rights Cases Towards Better Alignment with Experts
T.y.s.s Santosh | Shanshan Xu | Oana Ichim | Matthias Grabmair
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

This work demonstrates that Legal Judgement Prediction systems without expert-informed adjustments can be vulnerable to shallow, distracting surface signals that arise from corpus construction, case distribution, and confounding factors. To mitigate this, we use domain expertise to strategically identify statistically predictive but legally irrelevant information. We adopt adversarial training to prevent the system from relying on it. We evaluate our deconfounded models by employing interpretability techniques and comparing to expert annotations. Quantitative experiments and qualitative analysis show that our deconfounded model consistently aligns better with expert rationales than baselines trained for prediction only. We further contribute a set of reference expert annotations to the validation and testing partitions of an existing benchmark dataset of European Court of Human Rights cases.

pdf bib
Attack on Unfair ToS Clause Detection: A Case Study using Universal Adversarial Triggers
Shanshan Xu | Irina Broda | Rashid Haddad | Marco Negrini | Matthias Grabmair
Proceedings of the Natural Legal Language Processing Workshop 2022

Recent work has demonstrated that natural language processing techniques can support consumer protection by automatically detecting unfair clauses in the Terms of Service (ToS) Agreement. This work demonstrates that transformer-based ToS analysis systems are vulnerable to adversarial attacks. We conduct experiments attacking an unfair-clause detector with universal adversarial triggers. Experiments show that a minor perturbation of the text can considerably reduce the detection performance. Moreover, to measure the detectability of the triggers, we conduct a detailed human evaluation study by collecting both answer accuracy and response time from the participants. The results show that the naturalness of the triggers remains key to tricking readers.


pdf bib
Towards Inference-Oriented Reading Comprehension: ParallelQA
Soumya Wadhwa | Varsha Embar | Matthias Grabmair | Eric Nyberg
Proceedings of the Workshop on Generalization in the Age of Deep Learning

In this paper, we investigate the tendency of end-to-end neural Machine Reading Comprehension (MRC) models to match shallow patterns rather than perform inference-oriented reasoning on RC benchmarks. We aim to test the ability of these systems to answer questions which focus on referential inference. We propose ParallelQA, a strategy to formulate such questions using parallel passages. We also demonstrate that existing neural models fail to generalize well to this setting.


pdf bib
How Would You Say It? Eliciting Lexically Diverse Dialogue for Supervised Semantic Parsing
Abhilasha Ravichander | Thomas Manzini | Matthias Grabmair | Graham Neubig | Jonathan Francis | Eric Nyberg
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

Building dialogue interfaces for real-world scenarios often entails training semantic parsers starting from zero examples. How can we build datasets that better capture the variety of ways users might phrase their queries, and what queries are actually realistic? Wang et al. (2015) proposed a method to build semantic parsing datasets by generating canonical utterances using a grammar and having crowdworkers paraphrase them into natural wording. A limitation of this approach is that it induces bias towards using similar language as the canonical utterances. In this work, we present a methodology that elicits meaningful and lexically diverse queries from users for semantic parsing tasks. Starting from a seed lexicon and a generative grammar, we pair logical forms with mixed text-image representations and ask crowdworkers to paraphrase and confirm the plausibility of the queries that they generated. We use this method to build a semantic parsing dataset from scratch for a dialog agent in a smart-home simulation. We find evidence that this dataset, which we have named SmartHome, is demonstrably more lexically diverse and difficult to parse than existing domain-specific semantic parsing datasets.