Boda Lin

2024

Visual Enhanced Entity-Level Interaction Network for Multimodal Summarization
Haolong Yan | Binghao Tang | Boda Lin | Gang Zhao | Si Li
Findings of the Association for Computational Linguistics: NAACL 2024

MultiModal Summarization (MMS) aims to generate a concise summary based on multimodal data like texts and images and has wide application in multimodal fields.Previous works mainly focus on the coarse-level textual and visual features in which the overall features of the image interact with the whole sentence.However, the entities of the input text and the objects of the image may be underutilized, limiting the performance of current MMS models.In this paper, we propose a novel Visual Enhanced Entity-Level Interaction Network (VE-ELIN) to address the problem of underutilization of multimodal inputs at a fine-grained level in two ways.We first design a cross-modal entity interaction module to better fuse the entity information in text and the object information in vision.Then, we design an object-guided visual enhancement module to fully extract the visual features and enhance the focus of the image on the object area.We evaluate VE-ELIN on two MMS datasets and propose new metrics to measure the factual consistency of entities in the output.Finally, experimental results demonstrate that VE-ELIN is effective and outperforms previous methods under both traditional metrics and ours.The source code is available at https://github.com/summoneryhl/VE-ELIN.

pdf bib abs

Leveraging Generative Large Language Models with Visual Instruction and Demonstration Retrieval for Multimodal Sarcasm Detection
Binghao Tang | Boda Lin | Haolong Yan | Si Li
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Multimodal sarcasm detection aims to identify sarcasm in the given image-text pairs and has wide applications in the multimodal domains. Previous works primarily design complex network structures to fuse the image-text modality features for classification. However, such complicated structures may risk overfitting on in-domain data, reducing the performance in out-of-distribution (OOD) scenarios. Additionally, existing methods typically do not fully utilize cross-modal features, limiting their performance on in-domain datasets. Therefore, to build a more reliable multimodal sarcasm detection model, we propose a generative multimodal sarcasm model consisting of a designed instruction template and a demonstration retrieval module based on the large language model. Moreover, to assess the generalization of current methods, we introduce an OOD test set, RedEval. Experimental results demonstrate that our method is effective and achieves state-of-the-art (SOTA) performance on the in-domain MMSD2.0 and OOD RedEval datasets.

pdf bib abs

Linguistic Guidance for Sequence-to-Sequence AMR Parsing
Binghao Tang | Boda Lin | Si Li
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“The Abstract Meaning Representation (AMR) parsing aims at capturing the meaning of a sen-tence in the form of an AMR graph. Sequence-to-sequence (seq2seq)-based methods, utilizingpowerful Encoder-Decoder pre-trained language models (PLMs), have shown promising perfor-mance. Subsequent works have further improved the utilization of AMR graph information forseq2seq models. However, seq2seq models generate output sequence incrementally, and inac-curate subsequence at the beginning can negatively impact final outputs, also the interconnec-tion between other linguistic representation formats and AMR remains an underexplored domainin existing research. To mitigate the issue of error propagation and to investigate the guidinginfluence of other representation formats on PLMs, we propose a novel approach of LinguisticGuidance for Seq2seq AMR parsing (LGSA). Our proposed LGSA incorporates the very limitedinformation of various linguistic representation formats as guidance on the Encoder side, whichcan effectively enhance PLMs to their further potential, and boost AMR parsing. The resultson proverbial benchmark AMR2.0 and AMR3.0 demonstrate the efficacy of LGSA, which canimprove seq2seq AMR parsers without silver AMR data or alignment information. Moreover,we evaluate the generalization of LGSA by conducting experiments on out-of-domain datasets,and the results indicate that LGSA is even effective in such challenging scenarios.”

2022

pdf bib abs

Simple Tagging System with RoBERTa for Ancient Chinese
Binghao Tang | Boda Lin | Si Li
Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages

This paper describes the system submitted for the EvaHan 2022 Shared Task on word segmentation and part-of-speech tagging for Ancient Chinese. Our system is based on the pre-trained language model SIKU-RoBERTa and the simple tagging layers. Our system significantly outperforms the official baselines in the released test sets and shows the effectiveness.

pdf bib abs

Dependency parsing aims to extract syntactic dependency structure or semantic dependency structure for sentences.Existing methods for dependency parsing include transition-based method, graph-based method and sequence-to-sequence method.These methods obtain excellent performance and we notice them belong to labeling method.Therefore, it may be very valuable and interesting to explore the possibility of using generative method to implement dependency parsing.In this paper, we propose to achieve Dependency Parsing (DP) via Sequence Generation (SG) by utilizing only the pre-trained language model without any auxiliary structures.We first explore different serialization designing strategies for converting parsing structures into sequences.Then we design dependency units and concatenate these units into the sequence for DPSG.We verify the DPSG is capable of parsing on widely used DP benchmarks, i.e., PTB, UD2.2, SDP15 and SemEval16.In addition, we also investigate the astonishing low-resource applicability of DPSG, which includes unsupervised cross-domain conducted on CODT and few-shot cross-task conducted on SDP15.Our research demonstrates that sequence generation is one of the effective methods to achieve dependency parsing.Our codes are available now.

2021

pdf bib abs

Unsupervised Domain Adaptation Method with Semantic-Structural Alignment for Dependency Parsing
Boda Lin | Mingzheng Li | Si Li | Yong Luo
Findings of the Association for Computational Linguistics: EMNLP 2021

Unsupervised cross-domain dependency parsing is to accomplish domain adaptation for dependency parsing without using labeled data in target domain. Existing methods are often of the pseudo-annotation type, which generates data through self-annotation of the base model and performing iterative training. However, these methods fail to consider the change of model structure for domain adaptation. In addition, the structural information contained in the text cannot be fully exploited. To remedy these drawbacks, we propose a Semantics-Structure Adaptative Dependency Parser (SSADP), which accomplishes unsupervised cross-domain dependency parsing without relying on pseudo-annotation or data selection. In particular, we design two feature extractors to extract semantic and structural features respectively. For each type of features, a corresponding feature adaptation method is utilized to achieve domain adaptation to align the domain distribution, which effectively enhances the unsupervised cross-domain transfer capability of the model. We validate the effectiveness of our model by conducting experiments on the CODT1 and CTB9 respectively, and the results demonstrate that our model can achieve consistent performance improvement. Besides, we verify the structure transfer ability of the proposed model by introducing Weisfeiler-Lehman Test.

Co-authors

Lei Hou 1

Venues

Fix author