2024
pdf
bib
abs
Exploring the Capability of Multimodal LLMs with Yonkoma Manga: The YManga Dataset and Its Challenging Tasks
Qi Yang
|
Jingjie Zeng
|
Liang Yang
|
Zhihao Yang
|
Hongfei Lin
Findings of the Association for Computational Linguistics: EMNLP 2024
Yonkoma Manga, characterized by its four-panel structure, presents unique challenges due to its rich contextual information and strong sequential features. To address the limitations of current multimodal large language models (MLLMs) in understanding this type of data, we create a novel dataset named YManga from the Internet. After filtering out low-quality content, we collect a dataset of 1,015 yonkoma strips, containing 10,150 human annotations. We then define three challenging tasks for this dataset: panel sequence detection, generation of the author’s creative intention, and description generation for masked panels. These tasks progressively introduce the complexity of understanding and utilizing such image-text data. To the best of our knowledge, YManga is the first dataset specifically designed for yonkoma manga strips understanding. Extensive experiments conducted on this dataset reveal significant challenges faced by current multimodal large language models. Our results show a substantial performance gap between models and humans across all three tasks.
pdf
bib
abs
“Barking up the Right Tree”, a GAN-Based Pun Generation Model through Semantic Pruning
JingJie Zeng
|
Liang Yang
|
Jiahao Kang
|
Yufeng Diao
|
Zhihao Yang
|
Hongfei Lin
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
In the realm of artificial intelligence and linguistics, the automatic generation of humor, particularly puns, remains a complex task. This paper introduces an innovative approach that employs a Generative Adversarial Network (GAN) and semantic pruning techniques to generate humorous puns. We initiate our process by identifying potential pun candidates via semantic pruning. This is followed by the use of contrastive learning to decode the unique characteristics of puns, emphasizing both correct and incorrect interpretations. The learned features from contrastive learning are utilized within our GAN model to better capture the semantic nuances of puns. Specifically, the generator exploits the pruned semantic tree to generate pun texts, while the discriminator evaluates the generated puns, ensuring both linguistic correctness and humor. Evaluation results highlight our model’s capacity to produce semantically coherent and humorous puns, demonstrating an enhancement over prior methods and approach human-level performance. This work contributes significantly to the field of computational humor, advancing the capabilities of automatic pun generation.
pdf
bib
abs
yangqi at SemEval-2024 Task 9: Simulate Human Thinking by Large Language Model for Lateral Thinking Challenges
Qi Yang
|
Jingjie Zeng
|
Liang Yang
|
Hongfei Lin
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
This paper describes our system used in the SemEval-2024 Task 9 on two sub-tasks, BRAINTEASER: A Novel Task Defying Common Sense. In this work, we developed a system SHTL, which means simulate human thinking capabilities by Large Language Model (LLM). Our approach bifurcates into two main components: Common Sense Reasoning and Rationalize Defying Common Sense. To mitigate the hallucinations of LLM, we implemented a strategy that combines Retrieval-augmented Generation (RAG) with the the Self-Adaptive In-Context Learning (SAICL), thereby sufficiently leveraging the powerful language ability of LLM. The effectiveness of our method has been validated by its performance on the test set, with an average performance on two subtasks that is 30.1 higher than ChatGPT setting zero-shot and only 0.8 lower than that of humans.
2021
pdf
bib
abs
Label-Enhanced Hierarchical Contextualized Representation for Sequential Metaphor Identification
Shuqun Li
|
Liang Yang
|
Weidong He
|
Shiqi Zhang
|
Jingjie Zeng
|
Hongfei Lin
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Recent metaphor identification approaches mainly consider the contextual text features within a sentence or introduce external linguistic features to the model. But they usually ignore the extra information that the data can provide, such as the contextual metaphor information and broader discourse information. In this paper, we propose a model augmented with hierarchical contextualized representation to extract more information from both sentence-level and discourse-level. At the sentence level, we leverage the metaphor information of words that except the target word in the sentence to strengthen the reasoning ability of our model via a novel label-enhanced contextualized representation. At the discourse level, the position-aware global memory network is adopted to learn long-range dependency among the same words within a discourse. Finally, our model combines the representations obtained from these two parts. The experiment results on two tasks of the VUA dataset show that our model outperforms every other state-of-the-art method that also does not use any external knowledge except what the pre-trained language model contains.
2020
pdf
bib
abs
ALBERT-BiLSTM for Sequential Metaphor Detection
Shuqun Li
|
Jingjie Zeng
|
Jinhui Zhang
|
Tao Peng
|
Liang Yang
|
Hongfei Lin
Proceedings of the Second Workshop on Figurative Language Processing
In our daily life, metaphor is a common way of expression. To understand the meaning of a metaphor, we should recognize the metaphor words which play important roles. In the metaphor detection task, we design a sequence labeling model based on ALBERT-LSTM-softmax. By applying this model, we carry out a lot of experiments and compare the experimental results with different processing methods, such as with different input sentences and tokens, or the methods with CRF and softmax. Then, some tricks are adopted to improve the experimental results. Finally, our model achieves a 0.707 F1-score for the all POS subtask and a 0.728 F1-score for the verb subtask on the TOEFL dataset.