Mark Dras - ACL Anthology

Mark Dras

2025

VITAL: A New Dataset for Benchmarking Pluralistic Alignment in Healthcare
Anudeex Shetty | Amin Beheshti | Mark Dras | Usman Naseem
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Alignment techniques have become central to ensuring that Large Language Models (LLMs) generate outputs consistent with human values. However, existing alignment paradigms often model an averaged or monolithic preference, failing to account for the diversity of perspectives across cultures, demographics, and communities. This limitation is particularly critical in health-related scenarios, where plurality is essential due to the influence of culture, religion, personal values, and conflicting opinions. Despite progress in pluralistic alignment, no prior work has focused on health, likely due to the unavailability of publicly available datasets. To address this gap, we introduce VITAL, a new benchmark dataset comprising 13.1K value-laden situations and 5.4K multiple-choice questions focused on health, designed to assess and benchmark pluralistic alignment methodologies. Through extensive evaluation of eight LLMs of varying sizes, we demonstrate that existing pluralistic alignment techniques fall short in effectively accommodating diverse healthcare beliefs, underscoring the need for tailored AI alignment in specific domains. This work highlights the limitations of current approaches and lays the groundwork for developing health-specific alignment solutions.

Proceedings of the 23rd Annual Workshop of the Australasian Language Technology Association
Jonathan K. Kummerfeld | Aditya Joshi | Mark Dras
Proceedings of the 23rd Annual Workshop of the Australasian Language Technology Association

Some Odd Adversarial Perturbations and the Notion of Adversarial Closeness
Shakila Mahjabin Tonni | Pedro Faustini | Mark Dras
Proceedings of the 23rd Annual Workshop of the Australasian Language Technology Association

Deep learning models for language are vulnerable to adversarial examples. However, the perturbations introduced can sometimes seem odd or very noticeable to humans, which can make them less effective, a notion captured in some recent investigations as a property of '(non-)suspicion’. In this paper, we focus on three main types of perturbations that may raise suspicion: changes to named entities, inconsistent morphological inflections, and the use of non-English words. We define a notion of adversarial closeness and collect human annotations to construct two new datasets. We then use these datasets to investigate whether these kinds of perturbations have a disproportionate effect on human judgements. Following that, we propose new constraints to include in a constraint-based optimisation approach to adversarial text generation. Our human evaluation shows that these do improve the process by preventing the generation of especially odd or marked texts.

SHIELD: Classifier-Guided Prompting for Robust and Safer LVLMs
Juan Ren | Mark Dras | Usman Naseem
Proceedings of the 23rd Annual Workshop of the Australasian Language Technology Association

Large Vision-Language Models (LVLMs) unlock powerful multimodal reasoning but also expand the attack surface, particularly through adversarial inputs that conceal harmful goals in benign prompts. We propose SHIELD, a lightweight, model-agnostic preprocessing framework that couples fine-grained safety classification with category-specific guidance and explicit actions (Block, Reframe, and Forward). Unlike binary moderators, SHIELD composes tailored safety prompts that enforce nuanced refusals or safe redirections without retraining. Across five benchmarks and five representative LVLMs, SHIELD consistently lowers jailbreak and non-following rates while preserving utility. Our method is plug-and-play, incurs negligible overhead, and is easily extendable to new attack types—serving as a practical safety patch for both weakly and strongly aligned LVLMs.

Steering Towards Fairness: Mitigating Political Stance Bias in LLMs
Afrozah Nadeem | Mark Dras | Usman Naseem
Proceedings of the 8th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Texts

Recent advancements in large language models (LLMs) have enabled their widespread use across diverse real-world applications. However, concerns remain about their tendency to encode and reproduce ideological biases along political and economic dimensions. In this paper, we employ a framework for probing and mitigating such biases in decoder-based LLMs through analysis of internal model representations. Grounded in the Political Compass Test (PCT), this method uses contrastive pairs to extract and compare hidden layer activations from models like Mistral and DeepSeek. We introduce a comprehensive activation extraction pipeline capable of layer-wise analysis across multiple ideological axes, revealing meaningful disparities linked to political framing. Our results show that decoder LLMs systematically encode representational bias across layers, which can be leveraged for effective steering vector-based mitigation. This work provides new insights into how political bias is encoded in LLMs and offers a principled approach to debiasing beyond surface-level output interventions.

Proceedings of the 31st International Conference on Computational Linguistics: System Demonstrations
Owen Rambow | Leo Wanner | Marianna Apidianaki | Hend Al-Khalifa | Barbara Di Eugenio | Steven Schockaert | Brodie Mather | Mark Dras
Proceedings of the 31st International Conference on Computational Linguistics: System Demonstrations

Too Helpful, Too Harmless, Too Honest or Just Right?
Gautam Siddharth Kashyap | Mark Dras | Usman Naseem
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Large Language Models (LLMs) exhibit strong performance across a wide range of NLP tasks, yet aligning their outputs with the principles of Helpfulness, Harmlessness, and Honesty (HHH) remains a persistent challenge. Existing methods often optimize for individual alignment dimensions in isolation, leading to trade-offs and inconsistent behavior. While Mixture-of-Experts (MoE) architectures offer modularity, they suffer from poorly calibrated routing, limiting their effectiveness in alignment tasks. We propose TrinityX, a modular alignment framework that incorporates a Mixture of Calibrated Experts (MoCaE) within the Transformer architecture. TrinityX leverages separately trained experts for each HHH dimension, integrating their outputs through a calibrated, task-adaptive routing mechanism that combines expert signals into a unified, alignment-aware representation. Extensive experiments on three standard alignment benchmarks—Alpaca (Helpfulness), BeaverTails (Harmlessness), and TruthfulQA (Honesty)—demonstrate that TrinityX outperforms strong baselines, achieving relative improvements of 32.5% in win rate, 33.9% in safety score, and 28.4% in truthfulness. In addition, TrinityX reduces memory usage and inference latency by over 40% compared to prior MoE-based approaches. Ablation studies highlight the importance of calibrated routing, and cross-model evaluations confirm TrinityX’s generalization across diverse LLM backbones. Ourcode is available at: https://github.com/gskgautam/TrinityX

2024

Seeing the Forest through the Trees: Data Leakage from Partial Transformer Gradients
Weijun Li | Qiongkai Xu | Mark Dras
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Recent studies have shown that distributed machine learning is vulnerable to gradient inversion attacks, where private training data can be reconstructed by analyzing the gradients of the models shared in training. Previous attacks established that such reconstructions are possible using gradients from all parameters in the entire models. However, we hypothesize that most of the involved modules, or even their sub-modules, are at risk of training data leakage, and we validate such vulnerabilities in various intermediate layers of language models. Our extensive experiments reveal that gradients from a single Transformer layer, or even a single linear component with 0.54% parameters, are susceptible to training data leakage. Additionally, we show that applying differential privacy on gradients during training offers limited protection against the novel vulnerability of data disclosure.

Here’s a Free Lunch: Sanitizing Backdoored Models with Model Merge
Ansh Arora | Xuanli He | Maximilian Mozes | Srinibas Swain | Mark Dras | Qiongkai Xu
Findings of the Association for Computational Linguistics: ACL 2024

The democratization of pre-trained language models through open-source initiatives has rapidly advanced innovation and expanded access to cutting-edge technologies. However, this openness also brings significant security risks, including backdoor attacks, where hidden malicious behaviors are triggered by specific inputs, compromising natural language processing (NLP) system integrity and reliability. This paper suggests that merging a backdoored model with other homogeneous models can significantly remediate backdoor vulnerabilities even if such models are not entirely secure. In our experiments, we verify our hypothesis on various models (BERT-Base, RoBERTa-Large, Llama2-7B, and Mistral-7B) and datasets (SST-2, OLID, AG News, and QNLI). Compared to multiple advanced defensive approaches, our method offers an effective and efficient inference-stage defense against backdoor attacks on classification and instruction-tuned tasks without additional resources or specific knowledge. Our approach consistently outperforms recent advanced baselines, leading to an average of about 75% reduction in the attack success rate. Since model merging has been an established approach for improving model performance, the extra advantage it provides regarding defense can be seen as a cost-free bonus.

2023

What Learned Representations and Influence Functions Can Tell Us About Adversarial Examples
Shakila Mahjabin Tonni | Mark Dras
Findings of the Association for Computational Linguistics: IJCNLP-AACL 2023 (Findings)

2022

Few-shot fine-tuning SOTA summarization models for medical dialogues
David Fraile Navarro | Mark Dras | Shlomo Berkovsky
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop

Abstractive summarization of medical dialogues presents a challenge for standard training approaches, given the paucity of suitable datasets. We explore the performance of state-of-the-art models with zero-shot and few-shot learning strategies and measure the impact of pretraining with general domain and dialogue-specific text on the summarization performance.

Detecting Textual Adversarial Examples Based on Distributional Characteristics of Data Representations
Na Liu | Mark Dras | Wei Emma Zhang
Proceedings of the 7th Workshop on Representation Learning for NLP

Although deep neural networks have achieved state-of-the-art performance in various machine learning tasks, adversarial examples, constructed by adding small non-random perturbations to correctly classified inputs, successfully fool highly expressive deep classifiers into incorrect predictions. Approaches to adversarial attacks in natural language tasks have boomed in the last five years using character-level, word-level, phrase-level, or sentence-level textual perturbations. While there is some work in NLP on defending against such attacks through proactive methods, like adversarial training, there is to our knowledge no effective general reactive approaches to defence via detection of textual adversarial examples such as is found in the image processing literature. In this paper, we propose two new reactive methods for NLP to fill this gap, which unlike the few limited application baselines from NLP are based entirely on distribution characteristics of learned representations”:” we adapt one from the image processing literature (Local Intrinsic Dimensionality (LID)), and propose a novel one (MultiDistance Representation Ensemble Method (MDRE)). Adapted LID and MDRE obtain state-of-the-art results on character-level, word-level, and phrase-level attacks on the IMDB dataset as well as on the later two with respect to the MultiNLI dataset. For future research, we publish our code .

2021

Mention Flags (MF): Constraining Transformer-based Text Generators
Yufei Wang | Ian Wood | Stephen Wan | Mark Dras | Mark Johnson
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

This paper focuses on Seq2Seq (S2S) constrained text generation where the text generator is constrained to mention specific words which are inputs to the encoder in the generated outputs. Pre-trained S2S models or a Copy Mechanism are trained to copy the surface tokens from encoders to decoders, but they cannot guarantee constraint satisfaction. Constrained decoding algorithms always produce hypotheses satisfying all constraints. However, they are computationally expensive and can lower the generated text quality. In this paper, we propose Mention Flags (MF), which traces whether lexical constraints are satisfied in the generated outputs in an S2S decoder. The MF models can be trained to generate tokens in a hypothesis until all constraints are satisfied, guaranteeing high constraint satisfaction. Our experiments on the Common Sense Generation task (CommonGen) (Lin et al., 2020), End2end Restaurant Dialog task (E2ENLG) (Duˇsek et al., 2020) and Novel Object Captioning task (nocaps) (Agrawal et al., 2019) show that the MF models maintain higher constraint satisfaction and text quality than the baseline models and other constrained decoding algorithms, achieving state-of-the-art performance on all three tasks. These results are achieved with a much lower run-time than constrained decoding algorithms. We also show that the MF models work well in the low-resource setting.

2020

Large Scale Author Obfuscation Using Siamese Variational Auto-Encoder: The SiamAO System
Chakaveh Saedi | Mark Dras
Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics

Author obfuscation is the task of masking the author of a piece of text, with applications in privacy. Recent advances in deep neural networks have boosted author identification performance making author obfuscation more challenging. Existing approaches to author obfuscation are largely heuristic. Obfuscation can, however, be thought of as the construction of adversarial examples to attack author identification, suggesting that the deep learning architectures used for adversarial attacks could have application here. Current architectures are proposed to construct adversarial examples against classification-based models, which in author identification would exclude the high-performing similarity-based models employed when facing large number of authorial classes. In this paper, we propose the first deep learning architecture for constructing adversarial examples against similarity-based learners, and explore its application to author obfuscation. We analyse the output from both success in obfuscation and language acceptability, as well as comparing the performance with some common baselines, and showing promising results in finding a balance between safety and soundness of the perturbed texts.

2018

Native Language Identification With Classifier Stacking and Ensembles
Shervin Malmasi | Mark Dras
Computational Linguistics, Volume 44, Issue 3 - September 2018

Ensemble methods using multiple classifiers have proven to be among the most successful approaches for the task of Native Language Identification (NLI), achieving the current state of the art. However, a systematic examination of ensemble methods for NLI has yet to be conducted. Additionally, deeper ensemble architectures such as classifier stacking have not been closely evaluated. We present a set of experiments using three ensemble-based models, testing each with multiple configurations and algorithms. This includes a rigorous application of meta-classification models for NLI, achieving state-of-the-art results on several large data sets, evaluated in both intra-corpus and cross-corpus modes.

A Fast and Accurate Vietnamese Word Segmenter
Dat Quoc Nguyen | Dai Quoc Nguyen | Thanh Vu | Mark Dras | Mark Johnson
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

VnCoreNLP: A Vietnamese Natural Language Processing Toolkit
Thanh Vu | Dat Quoc Nguyen | Dai Quoc Nguyen | Mark Dras | Mark Johnson
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

We present an easy-to-use and fast toolkit, namely VnCoreNLP—a Java NLP annotation pipeline for Vietnamese. Our VnCoreNLP supports key natural language processing (NLP) tasks including word segmentation, part-of-speech (POS) tagging, named entity recognition (NER) and dependency parsing, and obtains state-of-the-art (SOTA) results for these tasks. We release VnCoreNLP to provide rich linguistic annotations to facilitate research work on Vietnamese NLP. Our VnCoreNLP is open-source and available at: https://github.com/vncorenlp/VnCoreNLP

Predicting accuracy on large datasets from smaller pilot data
Mark Johnson | Peter Anderson | Mark Dras | Mark Steedman
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Because obtaining training data is often the most difficult part of an NLP or ML project, we develop methods for predicting how much data is required to achieve a desired test accuracy by extrapolating results from models trained on a small pilot training dataset. We model how accuracy varies as a function of training size on subsets of the pilot data, and use that model to predict how much training data would be required to achieve the desired accuracy. We introduce a new performance extrapolation task to evaluate how well different extrapolations predict accuracy on larger training sets. We show that details of hyperparameter optimisation and the extrapolation models can have dramatic effects in a document classification task. We believe this is an important first step in developing methods for estimating the resources required to meet specific engineering performance targets.

2017

A Novel Neural Network Model for Joint POS Tagging and Graph-based Dependency Parsing
Dat Quoc Nguyen | Mark Dras | Mark Johnson
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

We present a novel neural network model that learns POS tagging and graph-based dependency parsing jointly. Our model uses bidirectional LSTMs to learn feature representations shared for both POS tagging and dependency parsing tasks, thus handling the feature-engineering problem. Our extensive experiments, on 19 languages from the Universal Dependencies project, show that our model outperforms the state-of-the-art neural network-based Stack-propagation model for joint POS tagging and transition-based dependency parsing, resulting in a new state of the art. Our code is open-source and available together with pre-trained models at: https://github.com/datquocnguyen/jPTDP

Unsupervised Text Segmentation Based on Native Language Characteristics
Shervin Malmasi | Mark Dras | Mark Johnson | Lan Du | Magdalena Wolska
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Most work on segmenting text does so on the basis of topic changes, but it can be of interest to segment by other, stylistically expressed characteristics such as change of authorship or native language. We propose a Bayesian unsupervised text segmentation approach to the latter. While baseline models achieve essentially random segmentation on our task, indicating its difficulty, a Bayesian model that incorporates appropriately compact language models and alternating asymmetric priors can achieve scores on the standard metrics around halfway to perfect segmentation.

Feature Hashing for Language and Dialect Identification
Shervin Malmasi | Mark Dras
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We evaluate feature hashing for language identification (LID), a method not previously used for this task. Using a standard dataset, we first show that while feature performance is high, LID data is highly dimensional and mostly sparse (>99.5%) as it includes large vocabularies for many languages; memory requirements grow as languages are added. Next we apply hashing using various hash sizes, demonstrating that there is no performance loss with dimensionality reductions of up to 86%. We also show that using an ensemble of low-dimension hash-based classifiers further boosts performance. Feature hashing is highly useful for LID and holds great promise for future work in this area.

Stock Market Prediction with Deep Learning: A Character-based Neural Language Model for Event-based Trading
Leonardo dos Santos Pinheiro | Mark Dras
Proceedings of the Australasian Language Technology Association Workshop 2017

From Word Segmentation to POS Tagging for Vietnamese
Dat Quoc Nguyen | Thanh Vu | Dai Quoc Nguyen | Mark Dras | Mark Johnson
Proceedings of the Australasian Language Technology Association Workshop 2017

2016

Modeling Language Change in Historical Corpora: The Case of Portuguese
Marcos Zampieri | Shervin Malmasi | Mark Dras
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents a number of experiments to model changes in a historical Portuguese corpus composed of literary texts for the purpose of temporal text classification. Algorithms were trained to classify texts with respect to their publication date taking into account lexical variation represented as word n-grams, and morphosyntactic variation represented by part-of-speech (POS) distribution. We report results of 99.8% accuracy using word unigram features with a Support Vector Machines classifier to predict the publication date of documents in time intervals of both one century and half a century. A feature analysis is performed to investigate the most informative features for this task and how they are linked to language change.

LTG at SemEval-2016 Task 11: Complex Word Identification with Classifier Ensembles
Shervin Malmasi | Mark Dras | Marcos Zampieri
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

An empirical study for Vietnamese dependency parsing
Dat Quoc Nguyen | Mark Dras | Mark Johnson
Proceedings of the Australasian Language Technology Association Workshop 2016

Predicting Post Severity in Mental Health Forums
Shervin Malmasi | Marcos Zampieri | Mark Dras
Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology

2015

Squibs: Evaluating Human Pairwise Preference Judgments
Mark Dras
Computational Linguistics, Volume 41, Issue 2 - June 2015

Large-Scale Native Language Identification with Cross-Corpus Evaluation
Shervin Malmasi | Mark Dras
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Norwegian Native Language Identification
Shervin Malmasi | Mark Dras | Irina Temnikova
Proceedings of the International Conference Recent Advances in Natural Language Processing

Clinical Information Extraction Using Word Representations
Shervin Malmasi | Hamed Hassanzadeh | Mark Dras
Proceedings of the Australasian Language Technology Association Workshop 2015

Cognate Identification using Machine Translation
Shervin Malmasi | Mark Dras
Proceedings of the Australasian Language Technology Association Workshop 2015

Oracle and Human Baselines for Native Language Identification
Shervin Malmasi | Joel Tetreault | Mark Dras
Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications

Language Identification using Classifier Ensembles
Shervin Malmasi | Mark Dras
Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects

2014

Language Transfer Hypotheses with Linear SVM Weights
Shervin Malmasi | Mark Dras
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Chinese Native Language Identification
Shervin Malmasi | Mark Dras
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers

Finnish Native Language Identification
Shervin Malmasi | Mark Dras
Proceedings of the Australasian Language Technology Association Workshop 2014

Arabic Native Language Identification
Shervin Malmasi | Mark Dras
Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP)

From Visualisation to Hypothesis Construction for Second Language Acquisition
Shervin Malmasi | Mark Dras
Proceedings of TextGraphs-9: the workshop on Graph-based Methods for Natural Language Processing

Cross-lingual Transfer Parsing for Low-Resourced Languages: An Irish Case Study
Teresa Lynn | Jennifer Foster | Mark Dras | Lamia Tounsi
Proceedings of the First Celtic Language Technology Workshop

2013

NLI Shared Task 2013: MQ Submission
Shervin Malmasi | Sze-Meng Jojo Wong | Mark Dras
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications

Working with a small dataset - semi-supervised dependency parsing for Irish
Teresa Lynn | Jennifer Foster | Mark Dras
Proceedings of the Fourth Workshop on Statistical Parsing of Morphologically-Rich Languages

2012

Is Bad Structure Better Than No Structure?: Unsupervised Parsing for Realisation Ranking
Yasaman Motazedi | Mark Dras | François Lareau
Proceedings of COLING 2012

Exploring Adaptor Grammars for Native Language Identification
Sze-Meng Jojo Wong | Mark Dras | Mark Johnson
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Irish Treebanking and Parsing: A Preliminary Evaluation
Teresa Lynn | Özlem Çetinoğlu | Jennifer Foster | Elaine Uí Dhonnchadha | Mark Dras | Josef van Genabith
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Language resources are essential for linguistic research and the development of NLP applications. Low-density languages, such as Irish, therefore lack significant research in this area. This paper describes the early stages in the development of new language resources for Irish ― namely the first Irish dependency treebank and the first Irish statistical dependency parser. We present the methodology behind building our new treebank and the steps we take to leverage upon the few existing resources. We discuss language-specific choices made when defining our dependency labelling scheme, and describe interesting Irish language characteristics such as prepositional attachment, copula, and clefting. We manually develop a small treebank of 300 sentences based on an existing POS-tagged corpus and report an inter-annotator agreement of 0.7902. We train MaltParser to achieve preliminary parsing results for Irish and describe a bootstrapping approach for further stages of development.

Active Learning and the Irish Treebank
Teresa Lynn | Jennifer Foster | Mark Dras | Elaine Uí Dhonnchadha
Proceedings of the Australasian Language Technology Association Workshop 2012

Valence Shifting: Is It A Valid Task?
Mary Gardiner | Mark Dras
Proceedings of the Australasian Language Technology Association Workshop 2012

Proceedings of the First International Workshop on Optimization Techniques for Human Language Technology
Pushpak Bhattacharyya | Asif Ekbal | Sriparna Saha | Mark Johnson | Diego Molla-Aliod | Mark Dras
Proceedings of the First International Workshop on Optimization Techniques for Human Language Technology

2011

Exploiting Parse Structures for Native Language Identification
Sze-Meng Jojo Wong | Mark Dras
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

Clause Restructuring For SMT Not Absolutely Helpful
Susan Howlett | Mark Dras
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

Collocations in Multilingual Natural Language Generation: Lexical Functions meet Lexical Functional Grammar
François Lareau | Mark Dras | Benjamin Börschinger | Robert Dale
Proceedings of the Australasian Language Technology Association Workshop 2011

Topic Modeling for Native Language Identification
Sze-Meng Jojo Wong | Mark Dras | Mark Johnson
Proceedings of the Australasian Language Technology Association Workshop 2011

Detecting Interesting Event Sequences for Sports Reporting
François Lareau | Mark Dras | Robert Dale
Proceedings of the 13th European Workshop on Natural Language Generation

2010

Dual-Path Phrase-Based Statistical Machine Translation
Susan Howlett | Mark Dras
Proceedings of the Australasian Language Technology Association Workshop 2010

Parser Features for Sentence Grammaticality Classification
Sze-Meng Jojo Wong | Mark Dras
Proceedings of the Australasian Language Technology Association Workshop 2010

2009

A New Subtree-Transfer Approach to Syntax-Based Reordering for Statistical Machine Translation
Maxim Khalilov | José A. R. Fonollosa | Mark Dras
Proceedings of the 13th Annual Conference of the European Association for Machine Translation

Improving Grammaticality in Statistical Sentence Generation: Introducing a Dependency Spanning Tree Algorithm with an Argument Satisfaction Model
Stephen Wan | Mark Dras | Robert Dale | Cécile Paris
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

Contrastive Analysis and Native Language Identification
Sze-Meng Jojo Wong | Mark Dras
Proceedings of the Australasian Language Technology Association Workshop 2009

Coupling Hierarchical Word Reordering and Decoding in Phrase-Based Statistical Machine Translation
Maxim Khalilov | José A. R. Fonollosa | Mark Dras
Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation (SSST-3) at NAACL HLT 2009

Using Hypernymy Acquisition to Tackle (Part of) Textual Entailment
Elena Akhmatova | Mark Dras
Proceedings of the 2009 Workshop on Applied Textual Inference (TextInfer)

2008

Choosing the Right Translation: A Syntactically Informed Classification Approach
Simon Zwarts | Mark Dras
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

Seed and Grow: Augmenting Statistically Generated Summary Sentences using Schematic Word Patterns
Stephen Wan | Robert Dale | Mark Dras | Cécile Paris
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

Morphosyntactic Target Language Matching in Statistical Machine Translation
Simon Zwarts | Mark Dras
Proceedings of the Australasian Language Technology Association Workshop 2008

2007

Syntax-based word reordering in phrase-based statistical machine translation: why does it work?
Simon Zwarts | Mark Dras
Proceedings of Machine Translation Summit XI: Papers

GLEU: Automatic Evaluation of Sentence-Level Fluency
Andrew Mutton | Mark Dras | Stephen Wan | Robert Dale
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

Proceedings of the Australasian Language Technology Workshop 2007
Nathalie Colineau | Mark Dras
Proceedings of the Australasian Language Technology Workshop 2007

Entailment due to Syntactically Encoded Semantic Relationships
Elena Akhmatova | Mark Dras
Proceedings of the Australasian Language Technology Workshop 2007

Exploring Approaches to Discriminating among Near-Synonyms
Mary Gardiner | Mark Dras
Proceedings of the Australasian Language Technology Workshop 2007

Statistical Machine Translation of Australian Aboriginal Languages: Morphological Analysis with Languages of Differing Morphological Richness
Simon Zwarts | Mark Dras
Proceedings of the Australasian Language Technology Workshop 2007

ACL 2007 Workshop on Deep Linguistic Processing
Timothy Baldwin | Mark Dras | Julia Hockenmaier | Tracy Holloway King | Gertjan van Noord
ACL 2007 Workshop on Deep Linguistic Processing

The Impact of Deep Linguistic Processing on Parsing Technology
Timothy Baldwin | Mark Dras | Julia Hockenmaier | Tracy Holloway King | Gertjan van Noord
Proceedings of the Tenth International Conference on Parsing Technologies

2006

Using Dependency-Based Features to Take the ’Para-farce’ out of Paraphrase
Stephen Wan | Mark Dras | Robert Dale | Cécile Paris
Proceedings of the Australasian Language Technology Workshop 2006

This Phrase-Based SMT System is Out of Order: Generalised Word Reordering in Machine Translation
Simon Zwarts | Mark Dras
Proceedings of the Australasian Language Technology Workshop 2006

2005

Towards Statistical Paraphrase Generation: Preliminary Evaluations of Grammaticality
Stephen Wan | Mark Dras | Robert Dale | Cécile Paris
Proceedings of the Third International Workshop on Paraphrasing (IWP2005)

Formal Grammars for Linguistic Treebank Queries
Mark Dras | Steve Cassidy
Proceedings of the Australasian Language Technology Workshop 2005

Searching for Grammaticality: Propagating Dependencies in the Viterbi Algorithm
Stephen Wan | Robert Dale | Mark Dras
Proceedings of the Tenth European Workshop on Natural Language Generation (ENLG-05)

2004

Non-contiguous tree parsing
Mark Dras | Chung-hye Han
Proceedings of the 10th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages

2003

Straight to the point: Discovering themes for summary generation
Stephen Wan | Mark Dras | Cecile Paris | Robert Dale
Proceedings of the Australasian Language Technology Workshop 2003

Using Thematic Information in Statistical Headline Generation
Stephen Wan | Mark Dras | Cécile Paris | Robert Dale
Proceedings of the ACL 2003 Workshop on Multilingual Summarization and Question Answering

2002

Korean-English MT and S-TAG
Mark Dras | Chung-hye Han
Proceedings of the Sixth International Workshop on Tree Adjoining Grammar and Related Frameworks (TAG+6)

2000

Multi-Component TAG and Notions of Formal Power
William Schuler | David Chiang | Mark Dras
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics

Some remarks on an extension of synchronous TAG
David Chiang | William Schuler | Mark Dras
Proceedings of the Fifth International Workshop on Tree Adjoining Grammar and Related Frameworks (TAG+5)

How problematic are clitics for S-TAG translation?
Mark Dras | Tonia Bleam
Proceedings of the Fifth International Workshop on Tree Adjoining Grammar and Related Frameworks (TAG+5)

1999

A Meta-Level Grammar: Redefining Synchronous TAG for Translation and Paraphrase
Mark Dras
Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics

1997

Representing Paraphrases Using Synchronous TAGs
Mark Dras
35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics

Co-authors

Sze-Meng Jojo Wong 6

Dat Quoc Nguyen 5

Jennifer Foster 4

François Lareau 3

Dai Quoc Nguyen 3

Marcos Zampieri 3

Elena Akhmatova 2

Timothy Baldwin 2

José A. R. Fonollosa 2

Mary Gardiner 2

Chung-hye Han 2

Julia Hockenmaier 2

Susan Howlett 2

Maxim Khalilov 2

Tracy Holloway King 2

William Schuler 2

Shakila Mahjabin Tonni 2

Elaine Uí Dhonnchadha 2

Gertjan van Noord 2

Hend Al-Khalifa 1

Peter Anderson 1

Marianna Apidianaki 1

Amin Beheshti 1

Shlomo Berkovsky 1

Pushpak Bhattacharyya 1

Benjamin Börschinger 1

Steve Cassidy 1

Nathalie Colineau 1

Barbara Di Eugenio 1

Pedro Faustini 1

Hamed Hassanzadeh 1

Gautam Siddharth Kashyap 1

Jonathan K. Kummerfeld 1

Brodie Mather 1

Yasaman Motazedi 1

Maximilian Mozes 1

Andrew Mutton 1

Afrozah Nadeem 1

David Fraile Navarro 1

Chakaveh Saedi 1

Sriparna Saha 1

Steven Schockaert 1

Anudeex Shetty 1

Mark Steedman 1

Srinibas Swain 1

Irina Temnikova 1

Joel Tetreault 1

Magdalena Wolska 1

Wei Emma Zhang 1

Leonardo dos Santos Pinheiro 1

Josef van Genabith 1

Özlem Çetinoğlu 1

Venues