Binghao Tang

2024

pdf bib abs
Linguistic Guidance for Sequence-to-Sequence AMR Parsing
Binghao Tang | Boda Lin | Si Li
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“The Abstract Meaning Representation (AMR) parsing aims at capturing the meaning of a sen-tence in the form of an AMR graph. Sequence-to-sequence (seq2seq)-based methods, utilizingpowerful Encoder-Decoder pre-trained language models (PLMs), have shown promising perfor-mance. Subsequent works have further improved the utilization of AMR graph information forseq2seq models. However, seq2seq models generate output sequence incrementally, and inac-curate subsequence at the beginning can negatively impact final outputs, also the interconnec-tion between other linguistic representation formats and AMR remains an underexplored domainin existing research. To mitigate the issue of error propagation and to investigate the guidinginfluence of other representation formats on PLMs, we propose a novel approach of LinguisticGuidance for Seq2seq AMR parsing (LGSA). Our proposed LGSA incorporates the very limitedinformation of various linguistic representation formats as guidance on the Encoder side, whichcan effectively enhance PLMs to their further potential, and boost AMR parsing. The resultson proverbial benchmark AMR2.0 and AMR3.0 demonstrate the efficacy of LGSA, which canimprove seq2seq AMR parsers without silver AMR data or alignment information. Moreover,we evaluate the generalization of LGSA by conducting experiments on out-of-domain datasets,and the results indicate that LGSA is even effective in such challenging scenarios.”

pdf bib abs
Visual Enhanced Entity-Level Interaction Network for Multimodal Summarization
Haolong Yan | Binghao Tang | Boda Lin | Gang Zhao | Si Li
Findings of the Association for Computational Linguistics: NAACL 2024

MultiModal Summarization (MMS) aims to generate a concise summary based on multimodal data like texts and images and has wide application in multimodal fields.Previous works mainly focus on the coarse-level textual and visual features in which the overall features of the image interact with the whole sentence.However, the entities of the input text and the objects of the image may be underutilized, limiting the performance of current MMS models.In this paper, we propose a novel Visual Enhanced Entity-Level Interaction Network (VE-ELIN) to address the problem of underutilization of multimodal inputs at a fine-grained level in two ways.We first design a cross-modal entity interaction module to better fuse the entity information in text and the object information in vision.Then, we design an object-guided visual enhancement module to fully extract the visual features and enhance the focus of the image on the object area.We evaluate VE-ELIN on two MMS datasets and propose new metrics to measure the factual consistency of entities in the output.Finally, experimental results demonstrate that VE-ELIN is effective and outperforms previous methods under both traditional metrics and ours.The source code is available at https://github.com/summoneryhl/VE-ELIN.

pdf bib abs
Leveraging Generative Large Language Models with Visual Instruction and Demonstration Retrieval for Multimodal Sarcasm Detection
Binghao Tang | Boda Lin | Haolong Yan | Si Li
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Multimodal sarcasm detection aims to identify sarcasm in the given image-text pairs and has wide applications in the multimodal domains. Previous works primarily design complex network structures to fuse the image-text modality features for classification. However, such complicated structures may risk overfitting on in-domain data, reducing the performance in out-of-distribution (OOD) scenarios. Additionally, existing methods typically do not fully utilize cross-modal features, limiting their performance on in-domain datasets. Therefore, to build a more reliable multimodal sarcasm detection model, we propose a generative multimodal sarcasm model consisting of a designed instruction template and a demonstration retrieval module based on the large language model. Moreover, to assess the generalization of current methods, we introduce an OOD test set, RedEval. Experimental results demonstrate that our method is effective and achieves state-of-the-art (SOTA) performance on the in-domain MMSD2.0 and OOD RedEval datasets.

2022

Dependency parsing aims to extract syntactic dependency structure or semantic dependency structure for sentences.Existing methods for dependency parsing include transition-based method, graph-based method and sequence-to-sequence method.These methods obtain excellent performance and we notice them belong to labeling method.Therefore, it may be very valuable and interesting to explore the possibility of using generative method to implement dependency parsing.In this paper, we propose to achieve Dependency Parsing (DP) via Sequence Generation (SG) by utilizing only the pre-trained language model without any auxiliary structures.We first explore different serialization designing strategies for converting parsing structures into sequences.Then we design dependency units and concatenate these units into the sequence for DPSG.We verify the DPSG is capable of parsing on widely used DP benchmarks, i.e., PTB, UD2.2, SDP15 and SemEval16.In addition, we also investigate the astonishing low-resource applicability of DPSG, which includes unsupervised cross-domain conducted on CODT and few-shot cross-task conducted on SDP15.Our research demonstrates that sequence generation is one of the effective methods to achieve dependency parsing.Our codes are available now.

pdf bib abs
Simple Tagging System with RoBERTa for Ancient Chinese
Binghao Tang | Boda Lin | Si Li
Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages

This paper describes the system submitted for the EvaHan 2022 Shared Task on word segmentation and part-of-speech tagging for Ancient Chinese. Our system is based on the pre-trained language model SIKU-RoBERTa and the simple tagging layers. Our system significantly outperforms the official baselines in the released test sets and shows the effectiveness.

Co-authors

Venues

Fix author