Bo Huang


2021

pdf bib
hub at SemEval-2021 Task 1: Fusion of Sentence and Word Frequency to Predict Lexical Complexity
Bo Huang | Yang Bai | Xiaobing Zhou
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

In this paper, we propose a method of fusing sentence information and word frequency information for the SemEval 2021 Task 1-Lexical Complexity Prediction (LCP) shared task. In our system, the sentence information comes from the RoBERTa model, and the word frequency information comes from the Tf-Idf algorithm. Use Inception block as a shared layer to learn sentence and word frequency information We described the implementation of our best system and discussed our methods and experiments in the task. The shared task is divided into two sub-tasks. The goal of the two sub-tasks is to predict the complexity of a predetermined word. The shared task is divided into two subtasks. The goal of the two subtasks is to predict the complexity of a predetermined word. The evaluation index of the task is the Pearson correlation coefficient. Our best performance system has Pearson correlation coefficients of 0.7434 and 0.8000 in the single-token subtask test set and the multi-token subtask test set, respectively.

pdf bib
hub at SemEval-2021 Task 2: Word Meaning Similarity Prediction Model Based on RoBERTa and Word Frequency
Bo Huang | Yang Bai | Xiaobing Zhou
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper introduces the system description of the hub team, which explains the related work and experimental results of our team’s participation in SemEval 2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation (MCL-WiC). The data of this shared task is mainly some cross-language or multi-language sentence pair corpus. The languages covered in the corpus include English, Chinese, French, Russian, and Arabic. The task goal is to judge whether the same words in these sentence pairs have the same meaning in the sentence. This can be seen as a task of binary classification of sentence pairs. What we need to do is to use our method to determine as accurately as possible the meaning of the words in a sentence pair are the same or different. The model used by our team is mainly composed of RoBERTa and Tf-Idf algorithms. The result evaluation index of task submission is the F1 score. We only participated in the English language task. The final score of the test set prediction results submitted by our team was 84.60.

pdf bib
hub at SemEval-2021 Task 5: Toxic Span Detection Based on Word-Level Classification
Bo Huang | Yang Bai | Xiaobing Zhou
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This article introduces the system description of the hub team, which explains the related work and experimental results of our team’s participation in SemEval 2021 Task 5: Toxic Spans Detection. The data for this shared task comes from some posts on the Internet. The task goal is to identify the toxic content contained in these text data. We need to find the span of the toxic text in the text data as accurately as possible. In the same post, the toxic text may be one paragraph or multiple paragraphs. Our team uses a classification scheme based on word-level to accomplish this task. The system we used to submit the results is ALBERT+BILSTM+CRF. The result evaluation index of the task submission is the F1 score, and the final score of the prediction result of the test set submitted by our team is 0.6640226029.

pdf bib
hub at SemEval-2021 Task 7: Fusion of ALBERT and Word Frequency Information Detecting and Rating Humor and Offense
Bo Huang | Yang Bai
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper introduces the system description of the hub team, which explains the related work and experimental results of our team’s participation in SemEval 2021 Task 7: HaHackathon: Detecting and Rating Humor and Offense. We successfully submitted the test set prediction results of the two subtasks in the task. The goal of the task is to perform humor detection, grade evaluation, and offensive evaluation on each English text data in the data set. Tasks can be divided into two types of subtasks. One is a text classification task, and the other is a text regression task. What we need to do is to use our method to detect the humor and offensive information of the sentence as accurately as possible. The methods used in the results submitted by our team are mainly composed of ALBERT, CNN, and Tf-Idf algorithms. The result evaluation indicators submitted by the classification task are F1 score and Accuracy. The result evaluation index of the regression task submission is the RMSE. The final scores of the prediction results of the two subtask test sets submitted by our team are task1a 0.921 (F1), task1a 0.9364 (Accuracy), task1b 0.6288 (RMSE), task1c 0.5333 (F1), task1c 0.0.5591 (Accuracy), and task2 0.5027 (RMSE) respectively.

pdf bib
TEAM HUB@LT-EDI-EACL2021: Hope Speech Detection Based On Pre-trained Language Model
Bo Huang | Yang Bai
Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion

This article introduces the system description of TEAM_HUB team participating in LT-EDI 2021: Hope Speech Detection. This shared task is the first task related to the desired voice detection. The data set in the shared task consists of three different languages (English, Tamil, and Malayalam). The task type is text classification. Based on the analysis and understanding of the task description and data set, we designed a system based on a pre-trained language model to complete this shared task. In this system, we use methods and models that combine the XLM-RoBERTa pre-trained language model and the Tf-Idf algorithm. In the final result ranking announced by the task organizer, our system obtained F1 scores of 0.93, 0.84, 0.59 on the English dataset, Malayalam dataset, and Tamil dataset. Our submission results are ranked 1, 2, and 3 respectively.

pdf bib
HUB@DravidianLangTech-EACL2021: Identify and Classify Offensive Text in Multilingual Code Mixing in Social Media
Bo Huang | Yang Bai
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

This paper introduces the system description of the HUB team participating in DravidianLangTech - EACL2021: Offensive Language Identification in Dravidian Languages. The theme of this shared task is the detection of offensive content in social media. Among the known tasks related to offensive speech detection, this is the first task to detect offensive comments posted in social media comments in the Dravidian language. The task organizer team provided us with the code-mixing task data set mainly composed of three different languages: Malayalam, Kannada, and Tamil. The tasks on the code mixed data in these three different languages can be seen as three different comment/post-level classification tasks. The task on the Malayalam data set is a five-category classification task, and the Kannada and Tamil language data sets are two six-category classification tasks. Based on our analysis of the task description and task data set, we chose to use the multilingual BERT model to complete this task. In this paper, we will discuss our fine-tuning methods, models, experiments, and results.

pdf bib
HUB@DravidianLangTech-EACL2021: Meme Classification for Tamil Text-Image Fusion
Bo Huang | Yang Bai
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

This article describes our system for task DravidianLangTech - EACL2021: Meme classification for Tamil. In recent years, we have witnessed the rapid development of the Internet and social media. Compared with traditional TV and radio media platforms, there are not so many restrictions on the use of online social media for individuals and many functions of online social media platforms are free. Based on this feature of social media, it is difficult for people’s posts/comments on social media to be strictly and effectively controlled like TV and radio content. Therefore, the detection of negative information in social media has attracted attention from academic and industrial fields in recent years. The task of classifying memes is also driven by offensive posts/comments prevalent on social media. The data of the meme classification task is the fusion data of text and image information. To identify the content expressed by the meme, we develop a system that combines BiGRU and CNN. It can fuse visual features and text features to achieve the purpose of using multi-modal information from memetic data. In this article, we discuss our methods, models, experiments, and results.

2019

pdf bib
Asking the Crowd: Question Analysis, Evaluation and Generation for Open Discussion on Online Forums
Zi Chai | Xinyu Xing | Xiaojun Wan | Bo Huang
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Teaching machines to ask questions is an important yet challenging task. Most prior work focused on generating questions with fixed answers. As contents are highly limited by given answers, these questions are often not worth discussing. In this paper, we take the first step on teaching machines to ask open-answered questions from real-world news for open discussion (openQG). To generate high-qualified questions, effective ways for question evaluation are required. We take the perspective that the more answers a question receives, the better it is for open discussion, and analyze how language use affects the number of answers. Compared with other factors, e.g. topic and post time, linguistic factors keep our evaluation from being domain-specific. We carefully perform variable control on 11.5M questions from online forums to get a dataset, OQRanD, and further perform question analysis. Based on these conclusions, several models are built for question evaluation. For openQG task, we construct OQGenD, the first dataset as far as we know, and propose a model based on conditional generative adversarial networks and our question evaluation model. Experiments show that our model can generate questions with higher quality compared with commonly-used text generation methods.