Yusheng Huang


2024

pdf bib
Leveraging Grammar Induction for Language Understanding and Generation
Jushi Kai | Shengyuan Hou | Yusheng Huang | Zhouhan Lin
Findings of the Association for Computational Linguistics: EMNLP 2024

Grammar induction has made significant progress in recent years. However, it is not clear how the application of induced grammar could enhance practical performance in downstream tasks. In this work, we introduce an unsupervised grammar induction method for language understanding and generation. We construct a grammar parser to induce constituency structures and dependency relations, which is simultaneously trained on downstream tasks without additional syntax annotations. The induced grammar features are subsequently incorporated into Transformer as a syntactic mask to guide self-attention. We evaluate and apply our method to multiple machine translation tasks and natural language understanding tasks. Our method demonstrates superior performance compared to the original Transformer and other models enhanced with external parsers. Experimental results indicate that our method is effective in both from-scratch and pre-trained scenarios. Additionally, our research highlights the contribution of explicitly modeling the grammatical structure of texts to neural network models.

pdf bib
Extracting Financial Events from Raw Texts via Matrix Chunking
Yusheng Huang | Ning Hu | Kunping Li | Nan Wang | Zhouhan Lin
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Event Extraction (EE) is widely used in the Chinese financial field to provide valuable structured information. However, there are two key challenges for Chinese financial EE in application scenarios. First, events need to be extracted from raw texts, which sets it apart from previous works like the Automatic Content Extraction (ACE) EE task, where EE is treated as a classification problem given the entity spans. Second, recognizing financial entities can be laborious, as they may involve multiple elements. In this paper, we introduce CFTE, a novel task for Chinese Financial Text-to-Event extraction, which directly extracts financial events from raw texts. We further present FINEED, a Chinese FINancial Event Extraction Dataset, and an efficient MAtrix-ChunKing method called MACK, designed for the extraction of financial events from raw texts. Specifically, FINEED is manually annotated with rich linguistic features. We propose a novel two-dimensional annotation method for FINEED, which can visualize the interactions among text components. Our MACK method is fault-tolerant by preserving the tag frequency distribution when identifying financial entities. We conduct extensive experiments and the results verify the effectiveness of our MACK method.

2021

pdf bib
Exploring Sentence Community for Document-Level Event Extraction
Yusheng Huang | Weijia Jia
Findings of the Association for Computational Linguistics: EMNLP 2021

Document-level event extraction is critical to various natural language processing tasks for providing structured information. Existing approaches by sequential modeling neglect the complex logic structures for long texts. In this paper, we leverage the entity interactions and sentence interactions within long documents and transform each document into an undirected unweighted graph by exploiting the relationship between sentences. We introduce the Sentence Community to represent each event as a subgraph. Furthermore, our framework SCDEE maintains the ability to extract multiple events by sentence community detection using graph attention networks and alleviate the role overlapping issue by predicting arguments in terms of roles. Experiments demonstrate that our framework achieves competitive results over state-of-the-art methods on the large-scale document-level event extraction dataset.