2024
pdf
bib
abs
Relevance-aware Diverse Query Generation for Out-of-domain Text Ranking
Jia-Huei Ju
|
Chao-Han Yang
|
Szu-Wei Fu
|
Ming-Feng Tsai
|
Chuan-Ju Wang
Proceedings of the 9th Workshop on Representation Learning for NLP (RepL4NLP-2024)
Domain adaptation presents significant challenges for out-of-domain text ranking, especially when supervised data is limited. In this paper, we present ReadQG (Relevance-Aware Diverse Query Generation), a method to generate informative synthetic queries to facilitate the adaptation process of text ranking models. Unlike previous approaches focusing solely on relevant query generation, our ReadQG generates diverse queries with continuous relevance scores. Specifically, we propose leveraging soft-prompt tuning and diverse generation objectives to control query generation according to the given relevance. Our experiments show that integrating negative queries into the learning process enhances the effectiveness of text ranking models in out-of-domain information retrieval (IR) benchmarks. Furthermore, we measure the quality of query generation, highlighting the underlying beneficial characteristics of negative queries. Our empirical results and analysis also shed light on potential directions for more advanced data augmentation in IR. The data and code have been released.
2023
pdf
bib
abs
A Compare-and-contrast Multistage Pipeline for Uncovering Financial Signals in Financial Reports
Jia-Huei Ju
|
Yu-Shiang Huang
|
Cheng-Wei Lin
|
Che Lin
|
Chuan-Ju Wang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
In this paper, we address the challenge of discovering financial signals in narrative financial reports. As these documents are often lengthy and tend to blend routine information with new information, it is challenging for professionals to discern critical financial signals. To this end, we leverage the inherent nature of the year-to-year structure of reports to define a novel signal-highlighting task; more importantly, we propose a compare-and-contrast multistage pipeline that recognizes different relationships between the reports and locates relevant rationales for these relationships. We also create and publicly release a human-annotated dataset for our task. Our experiments on the dataset validate the effectiveness of our pipeline, and we provide detailed analyses and ablation studies to support our findings.
pdf
bib
abs
FISH: A Financial Interactive System for Signal Highlighting
Ta-wei Huang
|
Jia-huei Ju
|
Yu-shiang Huang
|
Cheng-wei Lin
|
Yi-shyuan Chiang
|
Chuan-ju Wang
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations
In this system demonstration, we seek to streamline the process of reviewing financial statements and provide insightful information for practitioners. We develop FISH, an interactive system that extracts and highlights crucial textual signals from financial statements efficiently and precisely. To achieve our goal, we integrate pre-trained BERT representations and a fine-tuned BERT highlighting model with a newly-proposed two-stage classify-then-highlight pipeline. We also conduct the human evaluation, showing FISH can provide accurate financial signals. FISH overcomes the limitations of existing research andmore importantly benefits both academics and practitioners in finance as they can leverage state-of-the-art contextualized language models with their newly gained insights. The system is available online at
https://fish-web-fish.de.r.appspot.com/, and a short video for introduction is at
https://youtu.be/ZbvZQ09i6aw.
2020
pdf
bib
abs
Designing Templates for Eliciting Commonsense Knowledge from Pretrained Sequence-to-Sequence Models
Jheng-Hong Yang
|
Sheng-Chieh Lin
|
Rodrigo Nogueira
|
Ming-Feng Tsai
|
Chuan-Ju Wang
|
Jimmy Lin
Proceedings of the 28th International Conference on Computational Linguistics
While internalized “implicit knowledge” in pretrained transformers has led to fruitful progress in many natural language understanding tasks, how to most effectively elicit such knowledge remains an open question. Based on the text-to-text transfer transformer (T5) model, this work explores a template-based approach to extract implicit knowledge for commonsense reasoning on multiple-choice (MC) question answering tasks. Experiments on three representative MC datasets show the surprisingly good performance of our simple template, coupled with a logit normalization technique for disambiguation. Furthermore, we verify that our proposed template can be easily extended to other MC tasks with contexts such as supporting facts in open-book question answering settings. Starting from the MC task, this work initiates further research to find generic natural language templates that can effectively leverage stored knowledge in pretrained models.
2018
pdf
bib
abs
RiskFinder: A Sentence-level Risk Detector for Financial Reports
Yu-Wen Liu
|
Liang-Chih Liu
|
Chuan-Ju Wang
|
Ming-Feng Tsai
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations
This paper presents a web-based information system, RiskFinder, for facilitating the analyses of soft and hard information in financial reports. In particular, the system broadens the analyses from the word level to sentence level, which makes the system useful for practitioner communities and unprecedented among financial academics. The proposed system has four main components: 1) a Form 10-K risk-sentiment dataset, consisting of a set of risk-labeled financial sentences and pre-trained sentence embeddings; 2) metadata, including basic information on each company that published the Form 10-K financial report as well as several relevant financial measures; 3) an interface that highlights risk-related sentences in the financial reports based on the latest sentence embedding techniques; 4) a visualization of financial time-series data for a corresponding company. This paper also conducts some case studies to showcase that the system can be of great help in capturing valuable insight within large amounts of textual information. The system is now online available at
https://cfda.csie.org/RiskFinder/.
2014
pdf
bib
Financial Keyword Expansion via Continuous Word Vector Representations
Ming-Feng Tsai
|
Chuan-Ju Wang
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
2013
pdf
bib
Financial Sentiment Analysis for Risk Prediction
Chuan-Ju Wang
|
Ming-Feng Tsai
|
Tse Liu
|
Chin-Ting Chang
Proceedings of the Sixth International Joint Conference on Natural Language Processing
2012
pdf
bib
Visualization on Financial Terms via Risk Ranking from Financial Reports
Ming-Feng Tsai
|
Chuan-Ju Wang
Proceedings of COLING 2012: Demonstration Papers