Tianyong Hao


pdf bib
TAM of SCNU at SemEval-2023 Task 1: FCLL: A Fine-grained Contrastive Language-Image Learning Model for Cross-language Visual Word Sense Disambiguation
Qihao Yang | Yong Li | Xuelin Wang | Shunhao Li | Tianyong Hao
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

Visual Word Sense Disambiguation (WSD), as a fine-grained image-text retrieval task, aims to identify the images that are relevant to ambiguous target words or phrases. However, the difficulties of limited contextual information and cross-linguistic background knowledge in text processing make this task challenging. To alleviate this issue, we propose a Fine-grained Contrastive Language-Image Learning (FCLL) model, which learns fine-grained image-text knowledge by employing a new fine-grained contrastive learning mechanism and enriches contextual information by establishing relationship between concepts and sentences. In addition, a new multimodal-multilingual knowledge base involving ambiguous target words is constructed for visual WSD. Experiment results on the benchmark datasets from SemEval-2023 Task 1 show that our FCLL ranks at the first in overall evaluation with an average H@1 of 72.56\% and an average MRR of 82.22\%. The results demonstrate that FCLL is effective in inference on fine-grained language-vision knowledge. Source codes and the knowledge base are publicly available at https://github.com/CharlesYang030/FCLL.


pdf bib
中文糖尿病问题分类体系及标注语料库构建研究(The Construction of Question Taxonomy and An Annotated Chinese Corpus for Diabetes Question Classification)
Xiaobo Qian (钱晓波) | Wenxiu Xie (谢文秀) | Shaopei Long (龙绍沛) | Murong Lan (兰牧融) | Yuanyuan Mu (慕媛媛) | Tianyong Hao (郝天永)
Proceedings of the 21st Chinese National Conference on Computational Linguistics


pdf bib
A Self-supervised Joint Training Framework for Document Reranking
Xiaozhi Zhu | Tianyong Hao | Sijie Cheng | Fu Lee Wang | Hai Liu
Findings of the Association for Computational Linguistics: NAACL 2022

Pretrained language models such as BERT have been successfully applied to a wide range of natural language processing tasks and also achieved impressive performance in document reranking tasks. Recent works indicate that further pretraining the language models on the task-specific datasets before fine-tuning helps improve reranking performance. However, the pre-training tasks like masked language model and next sentence prediction were based on the context of documents instead of encouraging the model to understand the content of queries in document reranking task. In this paper, we propose a new self-supervised joint training framework (SJTF) with a self-supervised method called Masked Query Prediction (MQP) to establish semantic relations between given queries and positive documents. The framework randomly masks a token of query and encodes the masked query paired with positive documents, and uses a linear layer as a decoder to predict the masked token. In addition, the MQP is used to jointly optimize the models with supervised ranking objective during fine-tuning stage without an extra further pre-training stage. Extensive experiments on the MS MARCO passage ranking and TREC Robust datasets show that models trained with our framework obtain significant improvements compared to original models.


pdf bib
T-Know: a Knowledge Graph-based Question Answering and Infor-mation Retrieval System for Traditional Chinese Medicine
Ziqing Liu | Enwei Peng | Shixing Yan | Guozheng Li | Tianyong Hao
Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations

T-Know is a knowledge service system based on the constructed knowledge graph of Traditional Chinese Medicine (TCM). Using authorized and anonymized clinical records, medicine clinical guidelines, teaching materials, classic medical books, academic publications, etc., as data resources, the system extracts triples from free texts to build a TCM knowledge graph by our developed natural language processing methods. On the basis of the knowledge graph, a deep learning algorithm is implemented for single-round question understanding and multiple-round dialogue. In addition, the TCM knowledge graph also is used to support human-computer interactive knowledge retrieval by normalizing search keywords to medical terminology.

pdf bib
Annotating Measurable Quantitative Informationin Language: for an ISO Standard
Tianyong Hao | Haotai Wang | Xinyu Cao | Kiyong Lee
Proceedings of the 14th Joint ACL-ISO Workshop on Interoperable Semantic Annotation


pdf bib
The representation and extraction of qunatitative information
Tianyong Hao | Yunyan We | Jiaqi Qiang | Haitao Wang | Kiyong Lee
Proceedings of the 13th Joint ISO-ACL Workshop on Interoperable Semantic Annotation (ISA-13)


pdf bib
Towards Automatic Question Answering over Social Media by Learning Question Equivalence Patterns
Tianyong Hao | Wenyin Liu | Eugene Agichtein
Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics in a World of Social Media