Tianyong Hao


2023

pdf bib
TAM of SCNU at SemEval-2023 Task 1: FCLL: A Fine-grained Contrastive Language-Image Learning Model for Cross-language Visual Word Sense Disambiguation
Qihao Yang | Yong Li | Xuelin Wang | Shunhao Li | Tianyong Hao
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

Visual Word Sense Disambiguation (WSD), as a fine-grained image-text retrieval task, aims to identify the images that are relevant to ambiguous target words or phrases. However, the difficulties of limited contextual information and cross-linguistic background knowledge in text processing make this task challenging. To alleviate this issue, we propose a Fine-grained Contrastive Language-Image Learning (FCLL) model, which learns fine-grained image-text knowledge by employing a new fine-grained contrastive learning mechanism and enriches contextual information by establishing relationship between concepts and sentences. In addition, a new multimodal-multilingual knowledge base involving ambiguous target words is constructed for visual WSD. Experiment results on the benchmark datasets from SemEval-2023 Task 1 show that our FCLL ranks at the first in overall evaluation with an average H@1 of 72.56\% and an average MRR of 82.22\%. The results demonstrate that FCLL is effective in inference on fine-grained language-vision knowledge. Source codes and the knowledge base are publicly available at https://github.com/CharlesYang030/FCLL.

2022

pdf bib
中文糖尿病问题分类体系及标注语料库构建研究(The Construction of Question Taxonomy and An Annotated Chinese Corpus for Diabetes Question Classification)
Xiaobo Qian (钱晓波) | Wenxiu Xie (谢文秀) | Shaopei Long (龙绍沛) | Murong Lan (兰牧融) | Yuanyuan Mu (慕媛媛) | Tianyong Hao (郝天永)
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“糖尿病作为一种典型慢性疾病已成为全球重大公共卫生挑战之一。随着互联网的快速发展,庞大的二型糖尿病患者和高危人群对糖尿病专业信息获取的需求日益突出,糖尿病自动问答服务对患者和高危人群的日常健康服务也发挥着越来越重要的作用,然而存在缺乏细粒度分类等突出问题。本文设计了一个表示用户意图的新型糖尿病问题分类体系,包括6个大类和23个细类。基于该体系,本文从两个专业医疗问答网站爬取并构建了一个包含122732个问答对的中文糖尿病问答语料库DaCorp,同时对其中的8000个糖尿病问题进行人工标注,形成一个细粒度的糖尿病标注数据集。此外,为评估该标注数据集的质量,本文实现了8个主流基线分类模型。实验结果表明,最佳分类模型的准确率达到88.7%,验证了糖尿病标注数据集及所提分类体系的有效性。Dacorp、糖尿病标注数据集和标注指南已在线发布,可以免费用于学术研究。”

pdf bib
A Self-supervised Joint Training Framework for Document Reranking
Xiaozhi Zhu | Tianyong Hao | Sijie Cheng | Fu Lee Wang | Hai Liu
Findings of the Association for Computational Linguistics: NAACL 2022

Pretrained language models such as BERT have been successfully applied to a wide range of natural language processing tasks and also achieved impressive performance in document reranking tasks. Recent works indicate that further pretraining the language models on the task-specific datasets before fine-tuning helps improve reranking performance. However, the pre-training tasks like masked language model and next sentence prediction were based on the context of documents instead of encouraging the model to understand the content of queries in document reranking task. In this paper, we propose a new self-supervised joint training framework (SJTF) with a self-supervised method called Masked Query Prediction (MQP) to establish semantic relations between given queries and positive documents. The framework randomly masks a token of query and encodes the masked query paired with positive documents, and uses a linear layer as a decoder to predict the masked token. In addition, the MQP is used to jointly optimize the models with supervised ranking objective during fine-tuning stage without an extra further pre-training stage. Extensive experiments on the MS MARCO passage ranking and TREC Robust datasets show that models trained with our framework obtain significant improvements compared to original models.

2018

pdf bib
T-Know: a Knowledge Graph-based Question Answering and Infor-mation Retrieval System for Traditional Chinese Medicine
Ziqing Liu | Enwei Peng | Shixing Yan | Guozheng Li | Tianyong Hao
Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations

T-Know is a knowledge service system based on the constructed knowledge graph of Traditional Chinese Medicine (TCM). Using authorized and anonymized clinical records, medicine clinical guidelines, teaching materials, classic medical books, academic publications, etc., as data resources, the system extracts triples from free texts to build a TCM knowledge graph by our developed natural language processing methods. On the basis of the knowledge graph, a deep learning algorithm is implemented for single-round question understanding and multiple-round dialogue. In addition, the TCM knowledge graph also is used to support human-computer interactive knowledge retrieval by normalizing search keywords to medical terminology.

pdf bib
Annotating Measurable Quantitative Informationin Language: for an ISO Standard
Tianyong Hao | Haotai Wang | Xinyu Cao | Kiyong Lee
Proceedings of the 14th Joint ACL-ISO Workshop on Interoperable Semantic Annotation

2017

pdf bib
The representation and extraction of qunatitative information
Tianyong Hao | Yunyan We | Jiaqi Qiang | Haitao Wang | Kiyong Lee
Proceedings of the 13th Joint ISO-ACL Workshop on Interoperable Semantic Annotation (ISA-13)

2010

pdf bib
Towards Automatic Question Answering over Social Media by Learning Question Equivalence Patterns
Tianyong Hao | Wenyin Liu | Eugene Agichtein
Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics in a World of Social Media