Vincent Ng

2025

We introduce InternLM-Law, a large language model (LLM) tailored for addressing diverse legal tasks related to Chinese laws. These tasks range from responding to standard legal questions (e.g., legal exercises in textbooks) to analyzing complex real-world legal situations. Our work contributes to Chinese Legal NLP research by (1) conducting one of the most extensive evaluations of state-of-the-art general-purpose and legal-specific LLMs to date that involves an automatic evaluation on the 20 legal NLP tasks in LawBench, a human evaluation on a challenging version of the Legal Consultation task, and an automatic evaluation of a model’s ability to handle very long legal texts; (2) presenting a methodology for training a Chinese legal LLM that offers superior performance to all of its counterparts in our extensive evaluation; and (3) facilitating future research in this area by making all of our code and model publicly available at https://github.com/InternLM/InternLM-Law.

2024

pdf bib abs
Legal Case Retrieval: A Survey of the State of the Art
Yi Feng | Chuanyi Li | Vincent Ng
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent years have seen increasing attention on Legal Case Retrieval (LCR), a key task in the area of Legal AI that concerns the retrieval of cases from a large legal database of historical cases that are similar to a given query. This paper presents a survey of the major milestones made in LCR research, targeting researchers who are finding their way into the field and seek a brief account of the relevant datasets and the recent neural models and their performances.

pdf bib abs
Conundrums in Cross-Prompt Automated Essay Scoring: Making Sense of the State of the Art
Shengjie Li | Vincent Ng
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Cross-prompt automated essay scoring (AES), an under-investigated but challenging task that has gained increasing popularity in the AES community, aims to train an AES system that can generalize well to prompts that are unseen during model training. While recently-developed cross-prompt AES models have combined essay representations that are learned via sophisticated neural architectures with so-called prompt-independent features, an intriguing question is: are complex neural models needed to achieve state-of-the-art results? We answer this question by abandoning sophisticated neural architectures and developing a purely feature-based approach to cross-prompt AES that adopts a simple neural architecture. Experiments on the ASAP dataset demonstrate that our simple approach to cross-prompt AES can achieve state-of-the-art results.

pdf bib
Proceedings of the Seventh Workshop on Computational Models of Reference, Anaphora and Coreference
Maciej Ogrodniczuk | Anna Nedoluzhko | Massimo Poesio | Sameer Pradhan | Vincent Ng
Proceedings of the Seventh Workshop on Computational Models of Reference, Anaphora and Coreference

We present LawBench, the first evaluation benchmark composed of 20 tasks aimed to assess the ability of Large Language Models (LLMs) to perform Chinese legal-related tasks. LawBench is meticulously crafted to enable precise assessment of LLMs’ legal capabilities from three cognitive levels that correspond to the widely accepted Bloom’s cognitive taxonomy. Using LawBench, we present a comprehensive evaluation of 21 popular LLMs and the first comparative analysis of the empirical results in order to reveal their relative strengths and weaknesses. All data, model predictions and evaluation code are accessible from https://github.com/open-compass/LawBench.

pdf bib abs
Automated Essay Scoring: A Reflection on the State of the Art
Shengjie Li | Vincent Ng
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

While steady progress has been made on the task of automated essay scoring (AES) in the past decade, much of the recent work in this area has focused on developing models that beat existing models on a standard evaluation dataset. While improving performance numbers remains an important goal in the short term, such a focus is not necessarily beneficial for the long-term development of the field. We reflect on the state of the art in AES research, discussing issues that we believe can encourage researchers to think bigger than improving performance numbers with the ultimate goal of triggering discussion among AES researchers on how we should move forward.

pdf bib abs
Computational Meme Understanding: A Survey
Khoi P. N. Nguyen | Vincent Ng
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Computational Meme Understanding, which concerns the automated comprehension of memes, has garnered interest over the last four years and is facing both substantial opportunities and challenges. We survey this emerging area of research by first introducing a comprehensive taxonomy for memes along three dimensions – forms, functions, and topics. Next, we present three key tasks in Computational Meme Understanding, namely, classification, interpretation, and explanation, and conduct a comprehensive review of existing datasets and models, discussing their limitations. Finally, we highlight the key challenges and recommend avenues for future work.

Legal Judgment Prediction (LJP) refers to the task of automatically predicting judgment results (e.g., charges, law articles and term of penalty) given the fact description of cases. While SOTA models have achieved high accuracy and F1 scores on public datasets, existing datasets fail to evaluate specific aspects of these models (e.g., legal fairness, which significantly impact their applications in real scenarios). Inspired by functional testing in software engineering, we introduce LJPCHECK, a suite of functional tests for LJP models, to comprehend LJP models’ behaviors and offer diagnostic insights. We illustrate the utility of LJPCHECK on five SOTA LJP models. Extensive experiments reveal vulnerabilities in these models, prompting an in-depth discussion into the underlying reasons of their shortcomings.

Legal Judgment Prediction (LJP) has attracted significant attention in recent years. However, previous studies have primarily focused on cases involving only a single defendant, skipping multi-defendant cases due to complexity and difficulty. To advance research, we introduce CMDL, a large-scale real-world Chinese Multi-Defendant LJP dataset, which consists of over 393,945 cases with nearly 1.2 million defendants in total. For performance evaluation, we propose case-level evaluation metrics dedicated for the multi-defendant scenario. Experimental results on CMDL show existing SOTA approaches demonstrate weakness when applied to cases involving multiple defendants. We highlight several challenges that require attention and resolution.

The aim of the Universal Anaphora initiative is to push forward the state of the art in anaphora and anaphora resolution by expanding the aspects of anaphoric interpretation which are or can be reliably annotated in anaphoric corpora, producing unified standards to annotate and encode these annotations, delivering datasets encoded according to these standards, and developing methods for evaluating models that carry out this type of interpretation. Although several papers on aspects of the initiative have appeared, no overall description of the initiative’s goals, proposals and achievements has been published yet except as an online draft. This paper aims to fill this gap, as well as to discuss its progress so far.

pdf bib abs
ICLE++: Modeling Fine-Grained Traits for Holistic Essay Scoring
Shengjie Li | Vincent Ng
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

The majority of the recently developed models for automated essay scoring (AES) are evaluated solely on the ASAP corpus. However, ASAP is not without its limitations. For instance, it is not clear whether models trained on ASAP can generalize well when evaluated on other corpora. In light of these limitations, we introduce ICLE++, a corpus of persuasive student essays annotated with both holistic scores and trait-specific scores. Not only can ICLE++ be used to test the generalizability of AES models trained on ASAP, but it can also facilitate the evaluation of models developed for newer AES problems such as multi-trait scoring and cross-prompt scoring. We believe that ICLE++, which represents a culmination of our long-term effort in annotating the essays in the ICLE corpus, contributes to the set of much-needed annotated corpora for AES research.

While recent years have seen a surge of interest in the automatic processing of memes, much of the work in this area has focused on determining whether a meme contains malicious content. This paper proposes the new task of intent description generation: generating a description of the author’s intentions when creating the meme. To stimulate future work on this task, we (1) annotated a corpus of memes with the intents being perceived by the reader as well as the background knowledge needed to infer the intents and (2) established baseline performance on the intent description generation task using state-of-the-art large language models. Our results suggest the importance of background knowledge retrieval in intent description generation for memes.

2023

pdf bib abs
PairSpanBERT: An Enhanced Language Model for Bridging Resolution
Hideo Kobayashi | Yufang Hou | Vincent Ng
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We present PairSpanBERT, a SpanBERT-based pre-trained model specialized for bridging resolution. To this end, we design a novel pre-training objective that aims to learn the contexts in which two mentions are implicitly linked to each other from a large amount of data automatically generated either heuristically or via distance supervision with a knowledge graph. Despite the noise inherent in the automatically generated data, we achieve the best results reported to date on three evaluation datasets for bridging resolution when replacing SpanBERT with PairSpanBERT in a state-of-the-art resolver that jointly performs entity coreference resolution and bridging resolution.

pdf bib
Proceedings of the Sixth Workshop on Computational Models of Reference, Anaphora and Coreference (CRAC 2023)
Maciej Ogrodniczuk | Vincent Ng | Sameer Pradhan | Massimo Poesio
Proceedings of the Sixth Workshop on Computational Models of Reference, Anaphora and Coreference (CRAC 2023)

2022

pdf bib abs
Legal Judgment Prediction via Event Extraction with Constraints
Yi Feng | Chuanyi Li | Vincent Ng
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

While significant progress has been made on the task of Legal Judgment Prediction (LJP) in recent years, the incorrect predictions made by SOTA LJP models can be attributed in part to their failure to (1) locate the key event information that determines the judgment, and (2) exploit the cross-task consistency constraints that exist among the subtasks of LJP. To address these weaknesses, we propose EPM, an Event-based Prediction Model with constraints, which surpasses existing SOTA models in performance on a standard LJP dataset.

pdf bib abs
Constrained Multi-Task Learning for Bridging Resolution
Hideo Kobayashi | Yufang Hou | Vincent Ng
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We examine the extent to which supervised bridging resolvers can be improved without employing additional labeled bridging data by proposing a novel constrained multi-task learning framework for bridging resolution, within which we (1) design cross-task consistency constraints to guide the learning process; (2) pre-train the entity coreference model in the multi-task framework on the large amount of publicly available coreference data; and (3) integrating prior knowledge encoded in rule-based resolvers. Our approach achieves state-of-the-art results on three standard evaluation corpora.

The CODI-CRAC 2022 Shared Task on Anaphora Resolution in Dialogues is the second edition of an initiative focused on detecting different types of anaphoric relations in conversations of different kinds. Using five conversational datasets, four of which have been newly annotated with a wide range of anaphoric relations: identity, bridging references and discourse deixis, we defined multiple tasks focusing individually on these key relations. The second edition of the shared task maintained the focus on these relations and used the same datasets as in 2021, but new test data were annotated, the 2021 data were checked, and new subtasks were added. In this paper, we discuss the annotation schemes, the datasets, the evaluation scripts used to assess the system performance on these tasks, and provide a brief summary of the participating systems and the results obtained across 230 runs from three teams, with most submissions achieving significantly better results than our baseline methods.

pdf bib abs
Neural Anaphora Resolution in Dialogue Revisited
Shengjie Li | Hideo Kobayashi | Vincent Ng
Proceedings of the CODI-CRAC 2022 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue

We present the systems that we developed for all three tracks of the CODI-CRAC 2022 shared task, namely the anaphora resolution track, the bridging resolution track, and the discourse deixis resolution track. Combining an effective encoding of the input using the SpanBERT_Large encoder with an extensive hyperparameter search process, our systems achieved the highest scores in all phases of all three tracks.

pdf bib abs
End-to-End Neural Bridging Resolution
Hideo Kobayashi | Yufang Hou | Vincent Ng
Proceedings of the 29th International Conference on Computational Linguistics

The state of bridging resolution research is rather unsatisfactory: not only are state-of-the-art resolvers evaluated in unrealistic settings, but the neural models underlying these resolvers are weaker than those used for entity coreference resolution. In light of these problems, we evaluate bridging resolvers in an end-to-end setting, strengthen them with better encoders, and attempt to gain a better understanding of them via perturbation experiments and a manual analysis of their outputs.

pdf bib
Proceedings of the Fifth Workshop on Computational Models of Reference, Anaphora and Coreference
Maciej Ogrodniczuk | Sameer Pradhan | Anna Nedoluzhko | Vincent Ng | Massimo Poesio
Proceedings of the Fifth Workshop on Computational Models of Reference, Anaphora and Coreference

pdf bib abs
DiscoSense: Commonsense Reasoning with Discourse Connectives
Prajjwal Bhargava | Vincent Ng
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

We present DiscoSense, a benchmark for commonsense reasoning via understanding a wide variety of discourse connectives. We generate compelling distractors in DiscoSense using Conditional Adversarial Filtering, an extension of Adversarial Filtering that employs conditional generation. We show that state-of-the-art pre-trained language models struggle to perform well on DiscoSense, which makes this dataset ideal for evaluating next-generation commonsense reasoning systems.

pdf bib abs
End-to-End Neural Discourse Deixis Resolution in Dialogue
Shengjie Li | Vincent Ng
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

We adapt Lee et al.’s (2018) span-based entity coreference model to the task of end-to-end discourse deixis resolution in dialogue, specifically by proposing extensions to their model that exploit task-specific characteristics. The resulting model, dd-utt, achieves state-of-the-art results on the four datasets in the CODI-CRAC 2021 shared task.

2021

pdf bib
Proceedings of the CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue
Sopan Khosla | Ramesh Manuvinakurike | Vincent Ng | Massimo Poesio | Michael Strube | Carolyn Rosé
Proceedings of the CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue

pdf bib abs
The CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue
Sopan Khosla | Juntao Yu | Ramesh Manuvinakurike | Vincent Ng | Massimo Poesio | Michael Strube | Carolyn Rosé
Proceedings of the CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue

In this paper, we provide an overview of the CODI-CRAC 2021 Shared-Task: Anaphora Resolution in Dialogue. The shared task focuses on detecting anaphoric relations in different genres of conversations. Using five conversational datasets, four of which have been newly annotated with a wide range of anaphoric relations: identity, bridging references and discourse deixis, we defined multiple subtasks focusing individually on these key relations. We discuss the evaluation scripts used to assess the system performance on these subtasks, and provide a brief summary of the participating systems and the results obtained across ?? runs from 5 teams, with most submissions achieving significantly better results than our baseline methods.

pdf bib abs
Neural Anaphora Resolution in Dialogue
Hideo Kobayashi | Shengjie Li | Vincent Ng
Proceedings of the CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue

We describe the systems that we developed for the three tracks of the CODI-CRAC 2021 shared task, namely entity coreference resolution, bridging resolution, and discourse deixis resolution. Our team ranked second for entity coreference resolution, first for bridging resolution, and first for discourse deixis resolution.

pdf bib abs
The CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis Resolution in Dialogue: A Cross-Team Analysis
Shengjie Li | Hideo Kobayashi | Vincent Ng
Proceedings of the CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue

The CODI-CRAC 2021 shared task is the first shared task that focuses exclusively on anaphora resolution in dialogue and provides three tracks, namely entity coreference resolution, bridging resolution, and discourse deixis resolution. We perform a cross-task analysis of the systems that participated in the shared task in each of these tracks.

pdf bib
Proceedings of the Fourth Workshop on Computational Models of Reference, Anaphora and Coreference
Maciej Ogrodniczuk | Sameer Pradhan | Massimo Poesio | Yulia Grishina | Vincent Ng
Proceedings of the Fourth Workshop on Computational Models of Reference, Anaphora and Coreference

pdf bib abs
Conundrums in Event Coreference Resolution: Making Sense of the State of the Art
Jing Lu | Vincent Ng
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Despite recent promising results on the application of span-based models for event reference interpretation, there is a lack of understanding of what has been improved. We present an empirical analysis of a state-of-the-art span-based event reference systems with the goal of providing the general NLP audience with a better understanding of the state of the art and reference researchers with directions for future research.

User targeting is an essential task in the modern advertising industry: given a package of ads for a particular category of products (e.g., green tea), identify the online users to whom the ad package should be targeted. A (ad package specific) user targeting model is typically trained using historical clickthrough data: positive instances correspond to users who have clicked on an ad in the package before, whereas negative instances correspond to users who have not clicked on any ads in the package that were displayed to them. Collecting a sufficient amount of positive training data for training an accurate user targeting model, however, is by no means trivial. This paper focuses on the development of a method for automatic augmentation of the set of positive training instances. Experimental results on two datasets, including a real-world company dataset, demonstrate the effectiveness of our proposed method.

pdf bib abs
Bridging Resolution: Making Sense of the State of the Art
Hideo Kobayashi | Vincent Ng
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

While Yu and Poesio (2020) have recently demonstrated the superiority of their neural multi-task learning (MTL) model to rule-based approaches for bridging anaphora resolution, there is little understanding of (1) how it is better than the rule-based approaches (e.g., are the two approaches making similar or complementary mistakes?) and (2) what should be improved. To shed light on these issues, we (1) propose a hybrid rule-based and MTL approach that would enable a better understanding of their comparative strengths and weaknesses; and (2) perform a manual analysis of the errors made by the MTL model.

pdf bib abs
Constrained Multi-Task Learning for Event Coreference Resolution
Jing Lu | Vincent Ng
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

We propose a neural event coreference model in which event coreference is jointly trained with five tasks: trigger detection, entity coreference, anaphoricity determination, realis detection, and argument extraction. To guide the learning of this complex model, we incorporate cross-task consistency constraints into the learning process as soft constraints via designing penalty functions. In addition, we propose the novel idea of viewing entity coreference and event coreference as a single coreference task, which we believe is a step towards a unified model of coreference resolution. The resulting model achieves state-of-the-art results on the KBP 2017 event coreference dataset.

2020

pdf bib abs
Event Coreference Resolution with Non-Local Information
Jing Lu | Vincent Ng
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

We present two extensions to a state-of-theart joint model for event coreference resolution, which involve incorporating (1) a supervised topic model for improving trigger detection by providing global context, and (2) a preprocessing module that seeks to improve event coreference by discarding unlikely candidate antecedents of an event mention using discourse contexts computed based on salient entities. The resulting model yields the best results reported to date on the KBP 2017 English and Chinese datasets.

pdf bib abs
Bridging Resolution: A Survey of the State of the Art
Hideo Kobayashi | Vincent Ng
Proceedings of the 28th International Conference on Computational Linguistics

Bridging reference resolution is an anaphora resolution task that is arguably more challenging and less studied than entity coreference resolution. Given that significant progress has been made on coreference resolution in recent years, we believe that bridging resolution will receive increasing attention in the NLP community. Nevertheless, progress on bridging resolution is currently hampered in part by the scarcity of large annotated corpora for model training as well as the lack of standardized evaluation protocols. This paper presents a survey of the current state of research on bridging reference resolution and discusses future research directions.

pdf bib
Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference
Maciej Ogrodniczuk | Vincent Ng | Yulia Grishina | Sameer Pradhan
Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference

pdf bib abs
Conundrums in Entity Coreference Resolution: Making Sense of the State of the Art
Jing Lu | Vincent Ng
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Despite the significant progress on entity coreference resolution observed in recent years, there is a general lack of understanding of what has been improved. We present an empirical analysis of state-of-the-art resolvers with the goal of providing the general NLP audience with a better understanding of the state of the art and coreference researchers with directions for future research.

pdf bib abs
Identifying Exaggerated Language
Li Kong | Chuanyi Li | Jidong Ge | Bin Luo | Vincent Ng
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

While hyperbole is one of the most prevalent rhetorical devices, it is arguably one of the least studied devices in the figurative language processing community. We contribute to the study of hyperbole by (1) creating a corpus focusing on sentence-level hyperbole detection, (2) performing a statistical and manual analysis of our corpus, and (3) addressing the automatic hyperbole detection task.

pdf bib abs
Unsupervised Argumentation Mining in Student Essays
Isaac Persing | Vincent Ng
Proceedings of the Twelfth Language Resources and Evaluation Conference

State-of-the-art systems for argumentation mining are supervised, thus relying on training data containing manually annotated argument components and the relationships between them. To eliminate the reliance on annotated data, we present a novel approach to unsupervised argument mining. The key idea is to bootstrap from a small set of argument components automatically identified using simple heuristics in combination with reliable contextual cues. Results on a Stab and Gurevych’s corpus of 402 essays show that our unsupervised approach rivals two supervised baselines in performance and achieves 73.5-83.7% of the performance of a state-of-the-art neural approach.

pdf bib abs
Aspect-Based Sentiment Analysis as Fine-Grained Opinion Mining
Gerardo Ocampo Diaz | Xuanming Zhang | Vincent Ng
Proceedings of the Twelfth Language Resources and Evaluation Conference

We show how the general fine-grained opinion mining concepts of opinion target and opinion expression are related to aspect-based sentiment analysis (ABSA) and discuss their benefits for resource creation over popular ABSA annotation schemes. Specifically, we first discuss why opinions modeled solely in terms of (entity, aspect) pairs inadequately captures the meaning of the sentiment originally expressed by authors and how opinion expressions and opinion targets can be used to avoid the loss of information. We then design a meaning-preserving annotation scheme and apply it to two popular ABSA datasets, the 2016 SemEval ABSA Restaurant and Laptop datasets. Finally, we discuss the importance of opinion expressions and opinion targets for next-generation ABSA systems. We make our datasets publicly available for download.

2019

pdf bib
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Kentaro Inui | Jing Jiang | Vincent Ng | Xiaojun Wan
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

pdf bib abs
Improving Event Coreference Resolution by Learning Argument Compatibility from Unlabeled Data
Yin Jou Huang | Jing Lu | Sadao Kurohashi | Vincent Ng
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Argument compatibility is a linguistic condition that is frequently incorporated into modern event coreference resolution systems. If two event mentions have incompatible arguments in any of the argument roles, they cannot be coreferent. On the other hand, if these mentions have compatible arguments, then this may be used as information towards deciding their coreferent status. One of the key challenges in leveraging argument compatibility lies in the paucity of labeled data. In this work, we propose a transfer learning framework for event coreference resolution that utilizes a large amount of unlabeled data to learn argument compatibility of event mentions. In addition, we adopt an interactive inference network based model to better capture the compatible and incompatible relations between the context words of event mentions. Our experiments on the KBP 2017 English dataset confirm the effectiveness of our model in learning argument compatibility, which in turn improves the performance of the overall event coreference model.

pdf bib abs
Give Me More Feedback II: Annotating Thesis Strength and Related Attributes in Student Essays
Zixuan Ke | Hrishikesh Inamdar | Hui Lin | Vincent Ng
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

While the vast majority of existing work on automated essay scoring has focused on holistic scoring, researchers have recently begun work on scoring specific dimensions of essay quality. Nevertheless, progress on dimension-specific essay scoring is limited in part by the lack of annotated corpora. To facilitate advances in this area, we design a scoring rubric for scoring a core, yet unexplored dimension of persuasive essay quality, thesis strength, and annotate a corpus of essays with thesis strength scores. We additionally identify the attributes that could impact thesis strength and annotate the essays with the values of these attributes, which, when predicted by computational models, could provide further feedback to students on why her essay receives a particular thesis strength score.

pdf bib
Proceedings of the Second Workshop on Computational Models of Reference, Anaphora and Coreference
Maciej Ogrodniczuk | Sameer Pradhan | Yulia Grishina | Vincent Ng
Proceedings of the Second Workshop on Computational Models of Reference, Anaphora and Coreference

2018

pdf bib
Modeling Trolling in Social Media Conversations
Luis Gerardo Mojica de la Vega | Vincent Ng
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Improving Unsupervised Keyphrase Extraction using Background Knowledge
Yang Yu | Vincent Ng
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib abs
Give Me More Feedback: Annotating Argument Persuasiveness and Related Attributes in Student Essays
Winston Carlile | Nishant Gurrapadi | Zixuan Ke | Vincent Ng
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

While argument persuasiveness is one of the most important dimensions of argumentative essay quality, it is relatively little studied in automated essay scoring research. Progress on scoring argument persuasiveness is hindered in part by the scarcity of annotated corpora. We present the first corpus of essays that are simultaneously annotated with argument components, argument persuasiveness scores, and attributes of argument components that impact an argument’s persuasiveness. This corpus could trigger the development of novel computational models concerning argument persuasiveness that provide useful feedback to students on why their arguments are (un)persuasive in addition to how persuasive they are.

pdf bib abs
Modeling and Prediction of Online Product Review Helpfulness: A Survey
Gerardo Ocampo Diaz | Vincent Ng
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

As the amount of free-form user-generated reviews in e-commerce websites continues to increase, there is an increasing need for automatic mechanisms that sift through the vast amounts of user reviews and identify quality content. Review helpfulness modeling is a task which studies the mechanisms that affect review helpfulness and attempts to accurately predict it. This paper provides an overview of the most relevant work in helpfulness prediction and understanding in the past decade, discusses the insights gained from said work, and provides guidelines for future research.

pdf bib
Proceedings of the First Workshop on Computational Models of Reference, Anaphora and Coreference
Massimo Poesio | Vincent Ng | Maciej Ogrodniczuk
Proceedings of the First Workshop on Computational Models of Reference, Anaphora and Coreference

pdf bib
Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications
Yuen-Hsien Tseng | Hsin-Hsi Chen | Vincent Ng | Mamoru Komachi
Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications

2017

pdf bib abs
Lightly-Supervised Modeling of Argument Persuasiveness
Isaac Persing | Vincent Ng
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

We propose the first lightly-supervised approach to scoring an argument’s persuasiveness. Key to our approach is the novel hypothesis that lightly-supervised persuasiveness scoring is possible by explicitly modeling the major errors that negatively impact persuasiveness. In an evaluation on a new annotated corpus of online debate arguments, our approach rivals its fully-supervised counterparts in performance by four scoring metrics when using only 10% of the available training instances.

pdf bib abs
Joint Learning for Event Coreference Resolution
Jing Lu | Vincent Ng
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

While joint models have been developed for many NLP tasks, the vast majority of event coreference resolvers, including the top-performing resolvers competing in the recent TAC KBP 2016 Event Nugget Detection and Coreference task, are pipeline-based, where the propagation of errors from the trigger detection component to the event coreference component is a major performance limiting factor. To address this problem, we propose a model for jointly learning event coreference, trigger detection, and event anaphoricity. Our joint model is novel in its choice of tasks and its features for capturing cross-task interactions. To our knowledge, this is the first attempt to train a mention-ranking model and employ event anaphoricity for event coreference. Our model achieves the best results to date on the KBP 2016 English and Chinese datasets.

pdf bib
Proceedings of the 2nd Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2017)
Maciej Ogrodniczuk | Vincent Ng
Proceedings of the 2nd Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2017)

2016

pdf bib abs
Joint Inference for Event Coreference Resolution
Jing Lu | Deepak Venugopal | Vibhav Gogate | Vincent Ng
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Event coreference resolution is a challenging problem since it relies on several components of the information extraction pipeline that typically yield noisy outputs. We hypothesize that exploiting the inter-dependencies between these components can significantly improve the performance of an event coreference resolver, and subsequently propose a novel joint inference based event coreference resolver using Markov Logic Networks (MLNs). However, the rich features that are important for this task are typically very hard to explicitly encode as MLN formulas since they significantly increase the size of the MLN, thereby making joint inference and learning infeasible. To address this problem, we propose a novel solution where we implicitly encode rich features into our model by augmenting the MLN distribution with low dimensional unit clauses. Our approach achieves state-of-the-art results on two standard evaluation corpora.

bib abs
Advanced Markov Logic Techniques for Scalable Joint Inference in NLP
Deepak Venugopal | Vibhav Gogate | Vincent Ng
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts

In the early days of the statistical NLP era, many language processing tasks were tackled using the so-called pipeline architecture: the given task is broken into a series of sub-tasks such that the output of one sub-task is an input to the next sub-task in the sequence. The pipeline architecture is appealing for various reasons, including modularity, modeling convenience, and manageable computational complexity. However, it suffers from the error propagation problem: errors made in one sub-task are propagated to the next sub-task in the sequence, leading to poor accuracy on that sub-task, which in turn leads to more errors downstream. Another disadvantage associated with it is lack of feedback: errors made in a sub-task are often not corrected using knowledge uncovered while solving another sub-task down the pipeline.Realizing these weaknesses, researchers have turned to joint inference approaches in recent years. One such approach involves the use of Markov logic, which is defined as a set of weighted first-order logic formulas and, at a high level, unifies first-order logic with probabilistic graphical models. It is an ideal modeling language (knowledge representation) for compactly representing relational and uncertain knowledge in NLP. In a typical use case of MLNs in NLP, the application designer describes the background knowledge using a few first-order logic sentences and then uses software packages such as Alchemy, Tuffy, and Markov the beast to perform learning and inference (prediction) over the MLN. However, despite its obvious advantages, over the years, researchers and practitioners have found it difficult to use MLNs effectively in many NLP applications. The main reason for this is that it is hard to scale inference and learning algorithms for MLNs to large datasets and complex models, that are typical in NLP.In this tutorial, we will introduce the audience to recent advances in scaling up inference and learning in MLNs as well as new approaches to make MLNs a "black-box" for NLP applications (with only minor tuning required on the part of the user). Specifically, we will introduce attendees to a key idea that has emerged in the MLN research community over the last few years, lifted inference , which refers to inference techniques that take advantage of symmetries (e.g., synonyms), both exact and approximate, in the MLN . We will describe how these next-generation inference techniques can be used to perform effective joint inference. We will also present our new software package for inference and learning in MLNs, Alchemy 2.0, which is based on lifted inference, focusing primarily on how it can be used to scale up inference and learning in large models and datasets for applications such as semantic similarity determination, information extraction and question answering.

pdf bib abs
Event Coreference Resolution with Multi-Pass Sieves
Jing Lu | Vincent Ng
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Multi-pass sieve approaches have been successfully applied to entity coreference resolution and many other tasks in natural language processing (NLP), owing in part to the ease of designing high-precision rules for these tasks. However, the same is not true for event coreference resolution: typically lying towards the end of the standard information extraction pipeline, an event coreference resolver assumes as input the noisy outputs of its upstream components such as the trigger identification component and the entity coreference resolution component. The difficulty in designing high-precision rules makes it challenging to successfully apply a multi-pass sieve approach to event coreference resolution. In this paper, we investigate this challenge, proposing the first multi-pass sieve approach to event coreference resolution. When evaluated on the version of the KBP 2015 corpus available to the participants of EN Task 2 (Event Nugget Detection and Coreference), our approach achieves an Avg F-score of 40.32%, outperforming the best participating system by 0.67% in Avg F-score.

pdf bib abs
Markov Logic Networks for Text Mining: A Qualitative and Empirical Comparison with Integer Linear Programming
Luis Gerardo Mojica de la Vega | Vincent Ng
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Joint inference approaches such as Integer Linear Programming (ILP) and Markov Logic Networks (MLNs) have recently been successfully applied to many natural language processing (NLP) tasks, often outperforming their pipeline counterparts. However, MLNs are arguably much less popular among NLP researchers than ILP. While NLP researchers who desire to employ these joint inference frameworks do not necessarily have to understand their theoretical underpinnings, it is imperative that they understand which of them should be applied under what circumstances. With the goal of helping NLP researchers better understand the relative strengths and weaknesses of MLNs and ILP; we will compare them along different dimensions of interest, such as expressiveness, ease of use, scalability, and performance. To our knowledge, this is the first systematic comparison of ILP and MLNs on an NLP task.

pdf bib
End-to-End Argumentation Mining in Student Essays
Isaac Persing | Vincent Ng
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Chinese Zero Pronoun Resolution with Deep Neural Networks
Chen Chen | Vincent Ng
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Modeling Stance in Student Essays
Isaac Persing | Vincent Ng
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Proceedings of the Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2016)
Maciej Ogrodniczuk | Vincent Ng
Proceedings of the Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2016)

pdf bib
Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA2016)
Hsin-Hsi Chen | Yuen-Hsien Tseng | Vincent Ng | Xiaofei Lu
Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA2016)

2015

pdf bib
Sieve-Based Spatial Relation Extraction with Expanding Parse Trees
Jennifer D’Souza | Vincent Ng
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Recovering Traceability Links in Requirements Documents
Zeheng Li | Mingrui Chen | LiGuo Huang | Vincent Ng
Proceedings of the Nineteenth Conference on Computational Natural Language Learning

pdf bib
Chinese Event Coreference Resolution: An Unsupervised Probabilistic Model Rivaling Supervised Resolvers
Chen Chen | Vincent Ng
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Modeling Argument Strength in Student Essays
Isaac Persing | Vincent Ng
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Sieve-Based Entity Linking for the Biomedical Domain
Jennifer D’Souza | Vincent Ng
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
Chinese Zero Pronoun Resolution: A Joint Unsupervised Discourse-Aware Model Rivaling State-of-the-Art Resolvers
Chen Chen | Vincent Ng
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
UTD: Ensemble-Based Spatial Relation Extraction
Jennifer D’Souza | Vincent Ng
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing
Liang-Chih Yu | Zhifang Sui | Yue Zhang | Vincent Ng
Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing

2014

pdf bib
Ensemble-Based Medical Relation Classification
Jennifer D’Souza | Vincent Ng
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Why are You Taking this Stance? Identifying and Classifying Reasons in Ideological Debates
Kazi Saidul Hasan | Vincent Ng
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Chinese Zero Pronoun Resolution: An Unsupervised Probabilistic Model Rivaling Supervised Resolvers
Chen Chen | Vincent Ng
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Relieving the Computational Bottleneck: Joint Inference for Event Extraction with High-Dimensional Features
Deepak Venugopal | Chen Chen | Vibhav Gogate | Vincent Ng
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Vote Prediction on Comments in Social Polls
Isaac Persing | Vincent Ng
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib abs
SinoCoreferencer: An End-to-End Chinese Event Coreference Resolver
Chen Chen | Vincent Ng
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Compared to entity coreference resolution, there is a relatively small amount of work on event coreference resolution. Much work on event coreference was done for English. In fact, to our knowledge, there are no publicly available results on Chinese event coreference resolution. This paper describes the design, implementation, and evaluation of SinoCoreferencer, an end-to-end state-of-the-art ACE-style Chinese event coreference system. We have made SinoCoreferencer publicly available, in hope to facilitate the development of high-level Chinese natural language applications that can potentially benefit from event coreference information.

pdf bib abs
Annotating Inter-Sentence Temporal Relations in Clinical Notes
Jennifer D’Souza | Vincent Ng
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Owing in part to the surge of interest in temporal relation extraction, a number of datasets manually annotated with temporal relations between event-event pairs and event-time pairs have been produced recently. However, it is not uncommon to find missing annotations in these manually annotated datasets. Many researchers attributed this problem to “annotator fatigue”. While some of these missing relations can be recovered automatically, many of them cannot. Our goals in this paper are to (1) manually annotate certain types of missing links that cannot be automatically recovered in the i2b2 Clinical Temporal Relations Challenge Corpus, one of the recently released evaluation corpora for temporal relation extraction; and (2) empirically determine the usefulness of these additional annotations. We will make our annotations publicly available, in hopes of enabling a more accurate evaluation of temporal relation extraction systems.

pdf bib
Automatic Keyphrase Extraction: A Survey of the State of the Art
Kazi Saidul Hasan | Vincent Ng
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Modeling Prompt Adherence in Student Essays
Isaac Persing | Vincent Ng
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Scoring Coreference Partitions of Predicted Mentions: A Reference Implementation
Sameer Pradhan | Xiaoqiang Luo | Marta Recasens | Eduard Hovy | Vincent Ng | Michael Strube
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)