Le Qiu

2025

StockGenChaR: A Study on the Evaluation of Large Vision-Language Models on Stock Chart Captioning
Le Qiu | Emmanuele Chersoni
Proceedings of The 10th Workshop on Financial Technology and Natural Language Processing

pdf bib abs

ChengyuSTS: An Intrinsic Perspective on Mandarin Idiom Representation
Le Qiu | Emmanuele Chersoni | Aline Villavicencio
Proceedings of the 14th Joint Conference on Lexical and Computational Semantics (*SEM 2025)

Chengyu, or four-character idioms, are ubiquitous in both spoken and written Chinese. Despite their importance, chengyu are often underexplored in NLP tasks, and existing evaluation frameworks are primarily based on extrinsic measures. In this paper, we introduce an intrinsic evaluation task for Chinese idiomatic understanding: idiomatic semantic textual similarity (iSTS), which evaluates how well models can capture the semantic similarity of sentences containing idioms. To this purpose, we present a curated dataset: ChengyuSTS. Our experiments show that current pre-trained sentence Transformer models generally fail to capture the idiomaticity of chengyu in a zero-shot setting. We then show results of fine-tuned models using the SimCSE contrastive learning framework, which demonstrate promising results for handling idiomatic expressions. We also presented the results of DeepSeek for reference

2024

pdf bib

Probing Numerical Concepts in Financial Text with BERT Models
Shanyue Guo | Le Qiu | Emmanuele Chersoni
Proceedings of the Eighth Financial Technology and Natural Language Processing and the 1st Agent AI for Scenario Planning

pdf bib abs

CompLex-ZH: A New Dataset for Lexical Complexity Prediction in Mandarin and Cantonese
Le Qiu | Shanyue Guo | Tak-Sum Wong | Emmanuele Chersoni | John Lee | Chu-Ren Huang
Proceedings of the Third Workshop on Text Simplification, Accessibility and Readability (TSAR 2024)

The prediction of lexical complexity in context is assuming an increasing relevance in Natural Language Processing research, since identifying complex words is often the first step of text simplification pipelines. To the best of our knowledge, though, datasets annotated with complex words are available only for English and for a limited number of Western languages.In our paper, we introduce CompLex-ZH, a dataset including words annotated with complexity scores in sentential contexts for Chinese. Our data include sentences in Mandarin and Cantonese, which were selected from a variety of sources and textual genres. We provide a first evaluation with baselines combining hand-crafted and language models-based features.

2023

pdf bib abs

Identifying ESG Impact with Key Information
Le Qiu | Bo Peng | Jinghang Gu | Yu-Yin Hsu | Emmanuele Chersoni
Proceedings of the Sixth Workshop on Financial Technology and Natural Language Processing

The paper presents a concise summary of our work for the ML-ESG-2 shared task, exclusively on the Chinese and English datasets. ML-ESG-2 aims to ascertain the influence of news articles on corporations, specifically from an ESG perspective. To this end, we generally explored the capability of key information for impact identification and experimented with various techniques at different levels. For instance, we attempted to incorporate important information at the word level with TF-IDF, at the sentence level with TextRank, and at the document level with summarization. The final results reveal that the one with GPT-4 for summarisation yields the best predictions.

pdf bib abs

Collecting and Predicting Neurocognitive Norms for Mandarin Chinese
Le Qiu | Yu-Yin Hsu | Emmanuele Chersoni
Proceedings of the 15th International Conference on Computational Semantics

Language researchers have long assumed that concepts can be represented by sets of semantic features, and have traditionally encountered challenges in identifying a feature set that could be sufficiently general to describe the human conceptual experience in its entirety. In the dataset of English norms presented by Binder et al. (2016), also known as Binder norms, the authors introduced a new set of neurobiologically motivated semantic features in which conceptual primitives were defined in terms of modalities of neural information processing. However, no comparable norms are currently available for other languages. In our work, we built the Mandarin Chinese norm by translating the stimuli used in the original study and developed a comparable collection of human ratings for Mandarin Chinese. We also conducted some experiments on the automatic prediction of the Chinese Binder Norms based on the word embeddings of the corresponding words to assess the feasibility of modeling experiential semantic features via corpus-based representations.

Co-authors

John S. Y. Lee 1

Bo Peng 1

Aline Villavicencio 1

Tak-Sum Wong 1

Venues

Fix author