Man Lan

Also published as: Lan Man


2024

pdf bib
Are U a Joke Master? Pun Generation via Multi-Stage Curriculum Learning towards a Humor LLM
Yang Chen | Chong Yang | Tu Hu | Xinhao Chen | Man Lan | Li Cai | Xinlin Zhuang | Xuan Lin | Xin Lu | Aimin Zhou
Findings of the Association for Computational Linguistics: ACL 2024

Although large language models (LLMs) acquire extensive world knowledge and some reasoning abilities, their proficiency in generating humorous sentences remains a challenge. Previous research has demonstrated that the humor generation capabilities of ChatGPT are confined to producing merely 25 unique jokes. In this work, we concentrate on endowing LLMs with the ability of generating puns, a particular category of humor by preference learning method. We propose a multi-stage curriculum preference learning framework to optimize both pun structure preferences and humor preferences. Specifically, we improve the Direct Preference Optimization (DPO) algorithm to address the challenge of multi-objective alignment problem. Besides, to facilitate further advancement in this field, we collect a Chinese Pun (ChinesePun) dataset, containing 2.1k puns and corresponding annotations. Experimental results on both Chinese and English benchmark datasets demonstrate that our method significantly outperforms all the baseline models.

pdf bib
TOREE: Evaluating Topic Relevance of Student Essays for Chinese Primary and Middle School Education
Xinlin Zhuang | Hongyi Wu | Xinshu Shen | Peimin Yu | Gaowei Yi | Xinhao Chen | Tu Hu | Yang Chen | Yupei Ren | Yadong Zhang | Youqi Song | Binxuan Liu | Man Lan
Findings of the Association for Computational Linguistics: ACL 2024

Topic relevance of an essay demands that the composition adheres to a clear theme and aligns well with the essay prompt requirements, a critical aspect of essay quality evaluation. However, existing research of Automatic Essay Scoring (AES) for Chinese essays has overlooked topic relevance and lacks detailed feedback, while Automatic Essay Comment Generation (AECG) faces much complexity and difficulty. Additionally, current Large Language Models, including GPT-4, often make incorrect judgments and provide overly impractical feedback when evaluating topic relevance. This paper introduces TOREE (Topic Relevance Evaluation), a comprehensive dataset developed to assess topic relevance in Chinese primary and middle school students’ essays, which is beneficial for AES, AECG and other applications. Moreover, our proposed two-step method utilizes TOREE through a combination of Supervised Fine-tuning and Preference Learning. Experimental results demonstrate that TOREE is of high quality, and our method significantly enhances models’ performance on two designed tasks for topic relevance evaluation, improving both automatic and human evaluations across four diverse LLMs.

pdf bib
CERD: A Comprehensive Chinese Rhetoric Dataset for Rhetorical Understanding and Generation in Essays
Nuowei Liu | Xinhao Chen | Hongyi Wu | Changzhi Sun | Man Lan | Yuanbin Wu | Xiaopeng Bai | Shaoguang Mao | Yan Xia
Findings of the Association for Computational Linguistics: EMNLP 2024

pdf bib
CEAMC: Corpus and Empirical Study of Argument Analysis in Education via LLMs
Yupei Ren | Hongyi Wu | Zhaoguang Long | Shangqing Zhao | Xinyi Zhou | Zheqin Yin | Xinlin Zhuang | Xiaopeng Bai | Man Lan
Findings of the Association for Computational Linguistics: EMNLP 2024

This paper introduces the Chinese Essay Argument Mining Corpus (CEAMC), a manually annotated dataset designed for argument component classification on multiple levels of granularity. Existing argument component types in education remain simplistic and isolated, failing to encapsulate the complete argument information. Originating from authentic examination settings, CEAMC categorizes argument components into 4 coarse-grained and 10 fine-grained delineations, surpassing previous simple representations to capture the subtle nuances of argumentation in the real world, thus meeting the needs of complex and diverse argumentative scenarios. Our contributions include the development of CEAMC, the establishment of baselines for further research, and a thorough exploration of the performance of Large Language Models (LLMs) on CEAMC. The results indicate that our CEAMC can serve as a challenging benchmark for the development of argument analysis in education.

pdf bib
Towards Explainable Chinese Native Learner Essay Fluency Assessment: Dataset, Tasks, and Method
Xinshu Shen | Hongyi Wu | Yadong Zhang | Man Lan | Xiaopeng Bai | Shaoguang Mao | Yuanbin Wu | Xinlin Zhuang | Li Cai
Findings of the Association for Computational Linguistics: EMNLP 2024

Grammatical Error Correction (GEC) is a crucial technique in Automated Essay Scoring (AES) for evaluating the fluency of essays. However, in Chinese, existing GEC datasets often fail to consider the importance of specific grammatical error types within compositional scenarios, lack research on data collected from native Chinese speakers, and largely overlook cross-sentence grammatical errors. Furthermore, the measurement of the overall fluency of an essay is often overlooked. To address these issues, we present CEFA (Chinese Essay Fluency Assessment), an extensive corpus that is derived from essays authored by native Chinese-speaking primary and secondary students and encapsulates essay fluency scores along with both coarse and fine-grained grammatical error types and corrections. Experiments employing various benchmark models on CEFA substantiate the challenge of our dataset. Our findings further highlight the significance of fine-grained annotations in fluency assessment and the mutually beneficial relationship between error types and corrections

2023

pdf bib
Connective Prediction for Implicit Discourse Relation Recognition via Knowledge Distillation
Hongyi Wu | Hao Zhou | Man Lan | Yuanbin Wu | Yadong Zhang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Implicit discourse relation recognition (IDRR) remains a challenging task in discourse analysis due to the absence of connectives. Most existing methods utilize one-hot labels as the sole optimization target, ignoring the internal association among connectives. Besides, these approaches spend lots of effort on template construction, negatively affecting the generalization capability. To address these problems,we propose a novel Connective Prediction via Knowledge Distillation (CP-KD) approach to instruct large-scale pre-trained language models (PLMs) mining the latent correlations between connectives and discourse relations, which is meaningful for IDRR. Experimental results on the PDTB 2.0/3.0 and CoNLL2016 datasets show that our method significantly outperforms the state-of-the-art models on coarse-grained and fine-grained discourse relations. Moreover, our approach can be transferred to explicit discourse relation recognition(EDRR) and achieve acceptable performance.

pdf bib
A Multi-Task Dataset for Assessing Discourse Coherence in Chinese Essays: Structure, Theme, and Logic Analysis
Hongyi Wu | Xinshu Shen | Man Lan | Shaoguang Mao | Xiaopeng Bai | Yuanbin Wu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

This paper introduces the Chinese Essay Discourse Coherence Corpus (CEDCC), a multi-task dataset for assessing discourse coherence. Existing research tends to focus on isolated dimensions of discourse coherence, a gap which the CEDCC addresses by integrating coherence grading, topical continuity, and discourse relations. This approach, alongside detailed annotations, captures the subtleties of real-world texts and stimulates progress in Chinese discourse coherence analysis. Our contributions include the development of the CEDCC, the establishment of baselines for further research, and the demonstration of the impact of coherence on discourse relation recognition and automated essay scoring. The dataset and related codes is available at https://github.com/cubenlp/CEDCC_corpus.

2022

pdf bib
Understanding Gender Bias in Knowledge Base Embeddings
Yupei Du | Qi Zheng | Yuanbin Wu | Man Lan | Yan Yang | Meirong Ma
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Knowledge base (KB) embeddings have been shown to contain gender biases. In this paper, we study two questions regarding these biases: how to quantify them, and how to trace their origins in KB? Specifically, first, we develop two novel bias measures respectively for a group of person entities and an individual person entity. Evidence of their validity is observed by comparison with real-world census data. Second, we use the influence function to inspect the contribution of each triple in KB to the overall group bias. To exemplify the potential applications of our study, we also present two strategies (by adding and removing KB triples) to mitigate gender biases in KB embeddings.

pdf bib
An Effective and Efficient Entity Alignment Decoding Algorithm via Third-Order Tensor Isomorphism
Xin Mao | Meirong Ma | Hao Yuan | Jianchao Zhu | ZongYu Wang | Rui Xie | Wei Wu | Man Lan
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Entity alignment (EA) aims to discover the equivalent entity pairs between KGs, which is a crucial step for integrating multi-source KGs.For a long time, most researchers have regarded EA as a pure graph representation learning task and focused on improving graph encoders while paying little attention to the decoding process. In this paper, we propose an effective and efficient EA Decoding Algorithm via Third-order Tensor Isomorphism (DATTI).Specifically, we derive two sets of isomorphism equations: (1) Adjacency tensor isomorphism equations and (2) Gramian tensor isomorphism equations. By combining these equations, DATTI could effectively utilize the adjacency and inner correlation isomorphisms of KGs to enhance the decoding process of EA.Extensive experiments on public datasets indicate that our decoding algorithm can deliver significant performance improvements even on the most advanced EA methods, while the extra required time is less than 3 seconds.

pdf bib
LightEA: A Scalable, Robust, and Interpretable Entity Alignment Framework via Three-view Label Propagation
Xin Mao | Wenting Wang | Yuanbin Wu | Man Lan
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Entity Alignment (EA) aims to find equivalent entity pairs between KGs, which is the core step to bridging and integrating multi-source KGs. In this paper, we argue that existing complex EA methods inevitably inherit the inborn defects from their neural network lineage: poor interpretability and weak scalability. Inspired by recent studies, we reinvent the classical Label Propagation algorithm to effectively run on KGs and propose a neural-free EA framework — LightEA, consisting of three efficient components: (i) Random Orthogonal Label Generation, (ii) Three-view Label Propagation, and (iii) Sparse Sinkhorn Operation.According to the extensive experiments on public datasets, LightEA has impressive scalability, robustness, and interpretability. With a mere tenth of time consumption, LightEA achieves comparable results to state-of-the-art methods across all datasets and even surpasses them on many. Besides, due to the computational process of LightEA being entirely linear, we could trace the propagation process at each step and clearly explain how the entities are aligned.

pdf bib
Prompt-based Connective Prediction Method for Fine-grained Implicit Discourse Relation Recognition
Hao Zhou | Man Lan | Yuanbin Wu | Yuefeng Chen | Meirong Ma
Findings of the Association for Computational Linguistics: EMNLP 2022

Due to the absence of connectives, implicit discourse relation recognition (IDRR) is still a challenging and crucial task in discourse analysis. Most of the current work adopted multitask learning to aid IDRR through explicit discourse relation recognition (EDRR) or utilized dependencies between discourse relation labels to constrain model predictions. But these methods still performed poorly on fine-grained IDRR and even utterly misidentified on most of the few-shot discourse relation classes. To address these problems, we propose a novel Prompt-based Connective Prediction (PCP) method for IDRR. Our method instructs large-scale pre-trained models to use knowledge relevant to discourse relation and utilizes the strong correlation between connectives and discourse relation to help the model recognize implicit discourse relations. Experimental results show that our method surpasses the current state-of-the-art model and achieves significant improvements on those fine-grained few-shot discourse relation. Moreover, our approach is able to be transferred to EDRR and obtain acceptable results. Our code is released in https://github.com/zh-i9/PCP-for-IDRR.

pdf bib
A Simple Temporal Information Matching Mechanism for Entity Alignment between Temporal Knowledge Graphs
Li Cai | Xin Mao | Meirong Ma | Hao Yuan | Jianchao Zhu | Man Lan
Proceedings of the 29th International Conference on Computational Linguistics

Entity alignment (EA) aims to find entities in different knowledge graphs (KGs) that refer to the same object in the real world. Recent studies incorporate temporal information to augment the representations of KGs. The existing methods for EA between temporal KGs (TKGs) utilize a time-aware attention mechanisms to incorporate relational and temporal information into entity embeddings. The approaches outperform the previous methods by using temporal information. However, we believe that it is not necessary to learn the embeddings of temporal information in KGs since most TKGs have uniform temporal representations. Therefore, we propose a simple GNN model combined with a temporal information matching mechanism, which achieves better performance with less time and fewer parameters. Furthermore, since alignment seeds are difficult to label in real-world applications, we also propose a method to generate unsupervised alignment seeds via the temporal information of TKG. Extensive experiments on public datasets indicate that our supervised method significantly outperforms the previous methods and the unsupervised one has competitive performance.

pdf bib
Few Clean Instances Help Denoising Distant Supervision
Yufang Liu | Ziyin Huang | Yijun Wang | Changzhi Sun | Man Lan | Yuanbin Wu | Xiaofeng Mou | Ding Wang
Proceedings of the 29th International Conference on Computational Linguistics

Existing distantly supervised relation extractors usually rely on noisy data for both model training and evaluation, which may lead to garbage-in-garbage-out systems. To alleviate the problem, we study whether a small clean dataset could help improve the quality of distantly supervised models. We show that besides getting a more convincing evaluation of models, a small clean dataset also helps us to build more robust denoising models. Specifically, we propose a new criterion for clean instance selection based on influence functions. It collects sample-level evidence for recognizing good instances (which is more informative than loss-level evidence). We also propose a teacher-student mechanism for controlling purity of intermediate results when bootstrapping the clean set. The whole approach is model-agnostic and demonstrates strong performances on both denoising real (NYT) and synthetic noisy datasets.

2021

pdf bib
From Alignment to Assignment: Frustratingly Simple Unsupervised Entity Alignment
Xin Mao | Wenting Wang | Yuanbin Wu | Man Lan
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Cross-lingual entity alignment (EA) aims to find the equivalent entities between crosslingual KGs (Knowledge Graphs), which is a crucial step for integrating KGs. Recently, many GNN-based EA methods are proposed and show decent performance improvements on several public datasets. However, existing GNN-based EA methods inevitably inherit poor interpretability and low efficiency from neural networks. Motivated by the isomorphic assumption of GNN-based methods, we successfully transform the cross-lingual EA problem into an assignment problem. Based on this re-definition, we propose a frustratingly Simple but Effective Unsupervised entity alignment method (SEU) without neural networks. Extensive experiments have been conducted to show that our proposed unsupervised approach even beats advanced supervised methods across all public datasets while having high efficiency, interpretability, and stability.

2020

pdf bib
ECNU at SemEval-2020 Task 7: Assessing Humor in Edited News Headlines Using BiLSTM with Attention
Tiantian Zhang | Zhixuan Chen | Man Lan
Proceedings of the Fourteenth Workshop on Semantic Evaluation

In this paper we describe our system submitted to SemEval 2020 Task 7: “Assessing Humor in Edited News Headlines”. We participated in all subtasks, in which the main goal is to predict the mean funniness of the edited headline given the original and the edited headline. Our system involves two similar sub-networks, which generate vector representations for the original and edited headlines respectively. And then we do a subtract operation of the outputs from two sub-networks to predict the funniness of the edited headline.

pdf bib
A Span-based Linearization for Constituent Trees
Yang Wei | Yuanbin Wu | Man Lan
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We propose a novel linearization of a constituent tree, together with a new locally normalized model. For each split point in a sentence, our model computes the normalizer on all spans ending with that split point, and then predicts a tree span from them. Compared with global models, our model is fast and parallelizable. Different from previous local models, our linearization method is tied on the spans directly and considers more local features when performing span prediction, which is more interpretable and effective. Experiments on PTB (95.8 F1) and CTB (92.4 F1) show that our model significantly outperforms existing local models and efficiently achieves competitive results with global models.

2019

pdf bib
Joint Type Inference on Entities and Relations via Graph Convolutional Networks
Changzhi Sun | Yeyun Gong | Yuanbin Wu | Ming Gong | Daxin Jiang | Man Lan | Shiliang Sun | Nan Duan
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We develop a new paradigm for the task of joint entity relation extraction. It first identifies entity spans, then performs a joint inference on entity types and relation types. To tackle the joint type inference task, we propose a novel graph convolutional network (GCN) running on an entity-relation bipartite graph. By introducing a binary relation classification task, we are able to utilize the structure of entity-relation bipartite graph in a more efficient and interpretable way. Experiments on ACE05 show that our model outperforms existing joint models in entity performance and is competitive with the state-of-the-art in relation performance.

pdf bib
Graph-based Dependency Parsing with Graph Neural Networks
Tao Ji | Yuanbin Wu | Man Lan
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We investigate the problem of efficiently incorporating high-order features into neural graph-based dependency parsing. Instead of explicitly extracting high-order features from intermediate parse trees, we develop a more powerful dependency tree node representation which captures high-order information concisely and efficiently. We use graph neural networks (GNNs) to learn the representations and discuss several new configurations of GNN’s updating and aggregation functions. Experiments on PTB show that our parser achieves the best UAS and LAS on PTB (96.0%, 94.3%) among systems without using any external resources.

pdf bib
Scaling up Open Tagging from Tens to Thousands: Comprehension Empowered Attribute Value Extraction from Product Title
Huimin Xu | Wenting Wang | Xin Mao | Xinyu Jiang | Man Lan
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Supplementing product information by extracting attribute values from title is a crucial task in e-Commerce domain. Previous studies treat each attribute only as an entity type and build one set of NER tags (e.g., BIO) for each of them, leading to a scalability issue which unfits to the large sized attribute system in real world e-Commerce. In this work, we propose a novel approach to support value extraction scaling up to thousands of attributes without losing performance: (1) We propose to regard attribute as a query and adopt only one global set of BIO tags for any attributes to reduce the burden of attribute tag or model explosion; (2) We explicitly model the semantic representations for attribute and title, and develop an attention mechanism to capture the interactive semantic relations in-between to enforce our framework to be attribute comprehensive. We conduct extensive experiments in real-life datasets. The results show that our model not only outperforms existing state-of-the-art NER tagging models, but also is robust and generates promising results for up to 8,906 attributes.

pdf bib
Exploring Human Gender Stereotypes with Word Association Test
Yupei Du | Yuanbin Wu | Man Lan
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Word embeddings have been widely used to study gender stereotypes in texts. One key problem regarding existing bias scores is to evaluate their validities: do they really reflect true bias levels? For a small set of words (e.g. occupations), we can rely on human annotations or external data. However, for most words, evaluating the correctness of them is still an open problem. In this work, we utilize word association test, which contains rich types of word connections annotated by human participants, to explore how gender stereotypes spread within our minds. Specifically, we use random walk on word association graph to derive bias scores for a large amount of words. Experiments show that these bias scores correlate well with bias in the real world. More importantly, comparing with word-embedding-based bias scores, it provides a different perspective on gender stereotypes in words.

2018

pdf bib
AntNLP at CoNLL 2018 Shared Task: A Graph-Based Parser for Universal Dependency Parsing
Tao Ji | Yufang Liu | Yijun Wang | Yuanbin Wu | Man Lan
Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

We describe the graph-based dependency parser in our system (AntNLP) submitted to the CoNLL 2018 UD Shared Task. We use bidirectional lstm to get the word representation, then a bi-affine pointer networks to compute scores of candidate dependency edges and the MST algorithm to get the final dependency tree. From the official testing results, our system gets 70.90 LAS F1 score (rank 9/26), 55.92 MLAS (10/26) and 60.91 BLEX (8/26).

pdf bib
Extracting Entities and Relations with Joint Minimum Risk Training
Changzhi Sun | Yuanbin Wu | Man Lan | Shiliang Sun | Wenting Wang | Kuang-Chih Lee | Kewen Wu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We investigate the task of joint entity relation extraction. Unlike prior efforts, we propose a new lightweight joint learning paradigm based on minimum risk training (MRT). Specifically, our algorithm optimizes a global loss function which is flexible and effective to explore interactions between the entity model and the relation model. We implement a strong and simple neural network where the MRT is executed. Experiment results on the benchmark ACE05 and NYT datasets show that our model is able to achieve state-of-the-art joint extraction performances.

pdf bib
ECNU at SemEval-2018 Task 1: Emotion Intensity Prediction Using Effective Features and Machine Learning Models
Huimin Xu | Man Lan | Yuanbin Wu
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper describes our submissions to SemEval 2018 task 1. The task is affect intensity prediction in tweets, including five subtasks. We participated in all subtasks of English tweets. We extracted several traditional NLP, sentiment lexicon, emotion lexicon and domain specific features from tweets, adopted supervised machine learning algorithms to perform emotion intensity prediction.

pdf bib
ECNU at SemEval-2018 Task 2: Leverage Traditional NLP Features and Neural Networks Methods to Address Twitter Emoji Prediction Task
Xingwu Lu | Xin Mao | Man Lan | Yuanbin Wu
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper describes our submissions to Task 2 in SemEval 2018, i.e., Multilingual Emoji Prediction. We first investigate several traditional Natural Language Processing (NLP) features, and then design several deep learning models. For subtask 1: Emoji Prediction in English, we combine two different methods to represent tweet, i.e., supervised model using traditional features and deep learning model. For subtask 2: Emoji Prediction in Spanish, we only use deep learning model.

pdf bib
ECNU at SemEval-2018 Task 3: Exploration on Irony Detection from Tweets via Machine Learning and Deep Learning Methods
Zhenghang Yin | Feixiang Wang | Man Lan | Wenting Wang
Proceedings of the 12th International Workshop on Semantic Evaluation

The paper describes our submissions to task 3 in SemEval-2018. There are two subtasks: Subtask A is a binary classification task to determine whether a tweet is ironic, and Subtask B is a fine-grained classification task including four classes. To address them, we explored supervised machine learning method alone and in combination with neural networks.

pdf bib
ECNU at SemEval-2018 Task 10: Evaluating Simple but Effective Features on Machine Learning Methods for Semantic Difference Detection
Yunxiao Zhou | Man Lan | Yuanbin Wu
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper describes the system we submitted to Task 10 (Capturing Discriminative Attributes) in SemEval 2018. Given a triple (word1, word2, attribute), this task is to predict whether it exemplifies a semantic difference or not. We design and investigate several word embedding features, PMI features and WordNet features together with supervised machine learning methods to address this task. Officially released results show that our system ranks above average.

pdf bib
ECNU at SemEval-2018 Task 11: Using Deep Learning Method to Address Machine Comprehension Task
Yixuan Sheng | Man Lan | Yuanbin Wu
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper describes the system we submitted to the Task 11 in SemEval 2018, i.e., Machine Comprehension using Commonsense Knowledge. Given a passage and some questions that each have two candidate answers, this task requires the participate system to select out one answer meet the meaning of original text or commonsense knowledge from the candidate answers. For this task, we use a deep learning method to obtain final predict answer by calculating relevance of choices representations and question-aware document representation.

pdf bib
ECNU at SemEval-2018 Task 12: An End-to-End Attention-based Neural Network for the Argument Reasoning Comprehension Task
Junfeng Tian | Man Lan | Yuanbin Wu
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper presents our submissions to SemEval 2018 Task 12: the Argument Reasoning Comprehension Task. We investigate an end-to-end attention-based neural network to represent the two lexically close candidate warrants. On the one hand, we extract their different parts as attention vectors to obtain distinguishable representations. On the other hand, we use their surrounds (i.e., claim, reason, debate context) as another attention vectors to get contextual representations, which work as final clues to select the correct warrant. Our model achieves 60.4% accuracy and ranks 3rd among 22 participating systems.

2017

pdf bib
A Fast and Lightweight System for Multilingual Dependency Parsing
Tao Ji | Yuanbin Wu | Man Lan
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

We present a multilingual dependency parser with a bidirectional-LSTM (BiLSTM) feature extractor and a multi-layer perceptron (MLP) classifier. We trained our transition-based projective parser in UD version 2.0 datasets without any additional data. The parser is fast, lightweight and effective on big treebanks. In the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, the official results show that the macro-averaged LAS F1 score of our system Mengest is 61.33%.

pdf bib
ECNU at SemEval-2017 Task 1: Leverage Kernel-based Traditional NLP features and Neural Networks to Build a Universal Model for Multilingual and Cross-lingual Semantic Textual Similarity
Junfeng Tian | Zhiheng Zhou | Man Lan | Yuanbin Wu
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

To address semantic similarity on multilingual and cross-lingual sentences, we firstly translate other foreign languages into English, and then feed our monolingual English system with various interactive features. Our system is further supported by combining with deep learning semantic similarity and our best run achieves the mean Pearson correlation 73.16% in primary track.

pdf bib
ECNU at SemEval-2017 Task 3: Using Traditional and Deep Learning Methods to Address Community Question Answering Task
Guoshun Wu | Yixuan Sheng | Man Lan | Yuanbin Wu
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper describes the systems we submitted to the task 3 (Community Question Answering) in SemEval 2017 which contains three subtasks on English corpora, i.e., subtask A: Question-Comment Similarity, subtask B: Question-Question Similarity, and subtask C: Question-External Comment Similarity. For subtask A, we combined two different methods to represent question-comment pair, i.e., supervised model using traditional features and Convolutional Neural Network. For subtask B, we utilized the information of snippets returned from Search Engine with question subject as query. For subtask C, we ranked the comments by multiplying the probability of the pair related question comment being Good by the reciprocal rank of the related question.

pdf bib
ECNU at SemEval-2017 Task 7: Using Supervised and Unsupervised Methods to Detect and Locate English Puns
Yuhuan Xiu | Man Lan | Yuanbin Wu
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper describes our submissions to task 7 in SemEval 2017, i.e., Detection and Interpretation of English Puns. We participated in the first two subtasks, which are to detect and locate English puns respectively. For subtask 1, we presented a supervised system to determine whether or not a sentence contains a pun using similarity features calculated on sense vectors or cluster center vectors. For subtask 2, we established an unsupervised system to locate the pun by scoring each word in the sentence and we assumed that the word with the smallest score is the pun.

pdf bib
ECNU at SemEval-2017 Task 8: Rumour Evaluation Using Effective Features and Supervised Ensemble Models
Feixiang Wang | Man Lan | Yuanbin Wu
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper describes our submissions to task 8 in SemEval 2017, i.e., Determining rumour veracity and support for rumours. Given a rumoured tweet and a lot of reply tweets, the subtask A is to label whether these tweets are support, deny, query or comment, and the subtask B aims to predict the veracity (i.e., true, false, and unverified) with a confidence (in range of 0-1) of the given rumoured tweet. For both subtasks, we adopted supervised machine learning methods, incorporating rich features. Since training data is imbalanced, we specifically designed a two-step classifier to address subtask A .

pdf bib
ECNU at SemEval-2017 Task 4: Evaluating Effective Features on Machine Learning Methods for Twitter Message Polarity Classification
Yunxiao Zhou | Man Lan | Yuanbin Wu
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper reports our submission to subtask A of task 4 (Sentiment Analysis in Twitter, SAT) in SemEval 2017, i.e., Message Polarity Classification. We investigated several traditional Natural Language Processing (NLP) features, domain specific features and word embedding features together with supervised machine learning methods to address this task. Officially released results showed that our system ranked above average.

pdf bib
ECNU at SemEval-2017 Task 5: An Ensemble of Regression Algorithms with Effective Features for Fine-Grained Sentiment Analysis in Financial Domain
Mengxiao Jiang | Man Lan | Yuanbin Wu
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper describes our systems submitted to the Fine-Grained Sentiment Analysis on Financial Microblogs and News task (i.e., Task 5) in SemEval-2017. This task includes two subtasks in microblogs and news headline domain respectively. To settle this problem, we extract four types of effective features, including linguistic features, sentiment lexicon features, domain-specific features and word embedding features. Then we employ these features to construct models by using ensemble regression algorithms. Our submissions rank 1st and rank 5th in subtask 1 and subtask 2 respectively.

pdf bib
Multi-task Attention-based Neural Networks for Implicit Discourse Relationship Representation and Identification
Man Lan | Jianxiang Wang | Yuanbin Wu | Zheng-Yu Niu | Haifeng Wang
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We present a novel multi-task attention based neural network model to address implicit discourse relationship representation and identification through two types of representation learning, an attention based neural network for learning discourse relationship representation with two arguments and a multi-task framework for learning knowledge from annotated and unannotated corpora. The extensive experiments have been performed on two benchmark corpora (i.e., PDTB and CoNLL-2016 datasets). Experimental results show that our proposed model outperforms the state-of-the-art systems on benchmark corpora.

pdf bib
Large-scale Opinion Relation Extraction with Distantly Supervised Neural Network
Changzhi Sun | Yuanbin Wu | Man Lan | Shiliang Sun | Qi Zhang
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

We investigate the task of open domain opinion relation extraction. Different from works on manually labeled corpus, we propose an efficient distantly supervised framework based on pattern matching and neural network classifiers. The patterns are designed to automatically generate training data, and the deep learning model is design to capture various lexical and syntactic features. The result algorithm is fast and scalable on large-scale corpus. We test the system on the Amazon online review dataset. The result shows that our model is able to achieve promising performances without any human annotations.

2016

pdf bib
ECNU at SemEval-2016 Task 4: An Empirical Investigation of Traditional NLP Features and Word Embedding Features for Sentence-level and Topic-level Sentiment Analysis in Twitter
Yunxiao Zhou | Zhihua Zhang | Man Lan
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
ECNU at SemEval-2016 Task 5: Extracting Effective Features from Relevant Fragments in Sentence for Aspect-Based Sentiment Analysis in Reviews
Mengxiao Jiang | Zhihua Zhang | Man Lan
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
ECNU at SemEval 2016 Task 6: Relevant or Not? Supportive or Not? A Two-step Learning System for Automatic Detecting Stance in Tweets
Zhihua Zhang | Man Lan
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
ECNU at SemEval-2016 Task 7: An Enhanced Supervised Learning Method for Lexicon Sentiment Intensity Ranking
Feixiang Wang | Zhihua Zhang | Man Lan
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
ECNU at SemEval-2016 Task 1: Leveraging Word Embedding From Macro and Micro Views to Boost Performance for Semantic Textual Similarity
Junfeng Tian | Man Lan
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
ECNU at SemEval-2016 Task 3: Exploring Traditional Method and Deep Learning Method for Question Retrieval and Answer Ranking in Community Question Answering
Guoshun Wu | Man Lan
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
Two End-to-end Shallow Discourse Parsers for English and Chinese in CoNLL-2016 Shared Task
Jianxiang Wang | Man Lan
Proceedings of the CoNLL-16 shared task

2015

pdf bib
ECNU: Leveraging Word Embeddings to Boost Performance for Paraphrase in Twitter
Jiang Zhao | Man Lan
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
ECNU: Using Traditional Similarity Measurements and Word Embedding for Semantic Textual Similarity Estimation
Jiang Zhao | Man Lan | Jun Feng Tian
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
ECNU: Using Multiple Sources of CQA-based Information for Answers Selection and YES/NO Response Inference
Liang Yi | JianXiang Wang | Man Lan
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
ECNU: Multi-level Sentiment Analysis on Twitter Using Traditional Linguistic Features and Word Embedding Features
Zhihua Zhang | Guoshun Wu | Man Lan
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
ECNU: Extracting Effective Features from Multiple Sequential Sentences for Target-dependent Sentiment Analysis in Reviews
Zhihua Zhang | Man Lan
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
A Refined End-to-End Discourse Parser
Jianxiang Wang | Man Lan
Proceedings of the Nineteenth Conference on Computational Natural Language Learning - Shared Task

2014

pdf bib
ECNU: A Combination Method and Multiple Features for Aspect Extraction and Sentiment Polarity Classification
Fangxi Zhang | Zhihua Zhang | Man Lan
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
ECNU: Expression- and Message-level Sentiment Orientation Classification in Twitter Using Multiple Effective Features
Jiang Zhao | Man Lan | Tiantian Zhu
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
ECNU: Leveraging on Ensemble of Heterogeneous Features and Information Enrichment for Cross Level Semantic Similarity Estimation
Tiantian Zhu | Man Lan
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
ECNU: One Stone Two Birds: Ensemble of Heterogenous Measures for Semantic Relatedness and Textual Entailment
Jiang Zhao | Tiantian Zhu | Man Lan
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

2013

pdf bib
Leveraging Synthetic Discourse Data via Multi-task Learning for Implicit Discourse Relation Recognition
Man Lan | Yu Xu | Zhengyu Niu
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Probabilistic Sense Sentiment Similarity through Hidden Emotions
Mitra Mohtarami | Man Lan | Chew Lim Tan
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
ECNUCS: Measuring Short Text Semantic Equivalence Using Multiple Similarity Measurements
Tiantian Zhu | Lan Man
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity

pdf bib
ECNUCS: Recognizing Cross-lingual Textual Entailment Using Multiple Text Similarity and Text Difference Measures
Jiang Zhao | Man Lan | Zheng-Yu Niu
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

pdf bib
ECNUCS: A Surface Information Based System Description of Sentiment Analysis in Twitter in the SemEval-2013 (Task 2)
Tiantian Zhu | Fangxi Zhang | Lan Man
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

2012

pdf bib
Tiantianzhu7:System Description of Semantic Textual Similarity (STS) in the SemEval-2012 (Task 6)
Tiantian Zhu | Man Lan
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

2010

pdf bib
ECNU: Effective Semantic Relations Classification without Complicated Features or Multiple External Corpora
Yuan Chen | Man Lan | Jian Su | Zhi Min Zhou | Yu Xu
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf bib
The Effects of Discourse Connectives Prediction on Implicit Discourse Relation Recognition
Zhi Min Zhou | Man Lan | Zheng Yu Niu | Yu Xu | Jian Su
Proceedings of the SIGDIAL 2010 Conference

pdf bib
Predicting Discourse Connectives for Implicit Discourse Relation Recognition
Zhi-Min Zhou | Yu Xu | Zheng-Yu Niu | Man Lan | Jian Su | Chew Lim Tan
Coling 2010: Posters