MOBA-E2C: Generating MOBA Game Commentaries via Capturing Highlight Events from the Meta-Data

,


Introduction
With the development of the live-streaming service 1 and the e-sports industry, increasing game fans now are addicted to watching online live-streaming (Yang et al., 2022). In a live-streaming channel, besides the game video, a professional streamer makes commentaries to vividly narrate the game's progress (Ishigaki et al., 2021); thus, audiences can enjoy a game more interestingly and easily by following a game live-streaming, especially when they do not have enough background knowledge. Nonetheless, not all e-sports game competitions can be narrated by live-streaming channels because 1) Streamers often only select famous game competitions in their channels to attract more audiences; 2) Unlike traditional sports such as basketball and football that have a high threshold for hosting and live-streaming, an e-sports game only requires several computers with the Internet, and thus the number of e-sports game competitions is always larger than the number of live-streaming channels; 3) The game rules are complex (OpenAI, 2019) and can be changed at any time; thus, only a few experienced people are qualified for this job.
User preferences are diverse, and thus it is not trivial to provide professional commentaries to every game competition. A feasible way is to deploy machine commentators, which can work whenever and wherever possible in a low-cost way. However, building a machine commentator is challenging (Ponomarenko and Sirotkin, 2020). First, it is unrealistic for machines to generate comments by watching videos like human beings because understanding a game video requires a great deal of technology and workforce. Second, making commentaries to narrate e-sports games requires tremendous knowledge (domain & peripheral); Third, a machine commentator must have the ability to capture the highlight events of a game. Consequently, we can hardly find such products on the market. This paper explores the research field of the game commentary generation and mainly focuses on MOBA (Multiplayer Online Battle Arena) games such as Dota2 and League of Legends. With the mentioned challenges in mind, we propose a data-driven MOBA game generation framework MOBA-E2C along with a MOBA-FuseGPT generator. Instead of making commentaries based on the visual feature (i.e., video), the commentaries are narrated based on the meta-data of a game. In short, MOBA-E2C first uses several types of event handlers to capture the highlight events that need to be narrated from the game's meta-data. Each event uses a table of key-value attributes to record the necessary content, and then the game commentary generation problem can be subsequently regarded as a data-to-text generation task. Collecting and constructing supervised data is thorny. Finally, to generate high-quality commentaries and reduce the requirement of supervised data, our data-totext generator MOBA-FuseGPT takes advantage of both the rule-based method and the pre-trained language modeling (i.e., GPT2).
In experiments, a popular MOBA game Dota2 (Yu et al., 2018) is adopted as the case. We first designed 34 different event handlers for MOBA-E2C to capture highlight game events. Subsequently, we collected and constructed a Chinese Dota2 commentary generation dataset Dota2-Commentary, which includes 234 recent Dota2 game sessions 2 and 7,473 commentaries generated by professional human annotators. Experimental results have shown our approach has great improvement in both the default scenario and the few/zero-shot scenario. To the best of our knowledge, this work is the first Dota2 machine commentator, and Dota2-Commentary is the first dataset.
The contribution of this work is four-fold: • We propose a MOBA game commentary generation framework MOBA-E2C, which generates commentaries based on the meta-data.
• The proposed generator MOBA-FuseGPT can take advantage of both the rule-based method and the pre-trained language model.
• We construct a MOBA game commentary generation dataset Dota2-Commentary.
• Extensive experiments and analyses demonstrated the effectiveness of our approach.

General Paradigm and Overview
Although there are various MOBA games in the market, such as Dota2, LOL, and Honor of Kings, they follow a similar paradigm. Generally, two teams of players compete against each other on a predefined map, where each player controls a hero with a set of abilities and items. The objective is to destroy the opponent's buildings. As shown in Figure 2, the work-flow of MOBA-E2C can be summarized as 1) constructing a livestreaming sequence of game states by collecting meta-data from the corresponding MOBA game; 2) using event handlers to capture highlight events that should be narrated from the live-streaming sequence. Each captured highlight event is formulated as a table of a set of key-value attributes; 3) employing a table-to-text generator to generate commentaries based on the identified event tables.

Live-Streaming States
MOBA-E2C first regards each game session as a live-streaming sequence of game states S = (s 1 , s 2 , · · · , s n ). At each time t, the corresponding game state s t = {o i,t } records the current game progress data using the MOBA objects {o i,t }.
Based on the empirical knowledge of game experts, we have designed several universal MOBA objects. As shown in Table 1, MOBA-E2C uses 7 different kinds of object to cover objects in a MOBA game. Specifically, for each game session, we use a map object to record the meta-information, such as the time, game state, etc. Next, we use two team objects to represent two opponents, and each team object includes n player objects. Each player object has a hero object, some item objects, and some ability objects. Finally, each team also includes some building objects.
For each MOBA object, we use a lot of key-value pairs to illustrate its attributes, where the key is the attribute name and the value is the corresponding attribute value. For example, in a hero object, we use a health attribute to track the current health, a level attribute to track the current level.

Objects
Description map the meta information (time, etc.) team i each game has two opponents. building i buildings to be defended/attacked. player i,j each team i has n players. hero i,j each player i,j controls a hero i,j item i,j,k each hero i,j has its own items ability i,j,k each hero i,j has its own abilities

Highlight Event Identification
Intuitively, the commentaries are needed if some highlight events happen at the time t. Hence, the next job is to capture highlight events based on the current live-streaming states s 1:t . MOBA-E2C uses several event handlers to capture events that should be commentated on by monitoring the game state s t and tracking the change between the current state s t and the previous states s 1:t−1 . Specifically, based on the knowledge of experienced human players and commentators, we propose four different types of event handlers to capture highlight events: 1. State-Change: By checking the difference between the current object o i,t and the last o i,t−1 , we can figure out an event has just happened. For example, by comparing the health of hero i , we can find hero i has just died.

2.
Counting: It monitors and counts some aspects of an object. Once the counting value reaches a milestone, it generates an event. For example, once hero i has killed others 10 times, there will be a corresponding event.
3. Tracking: It continuously tracks some aspects. Once the current situation meets some rules, a new event can be identified. For example, if hero i has died, a tracking handler is created to track and tell the progress of revival.
4. Summary: It generates an event by analyzing some aspects periodically; for example, summarizing the net worth of two teams.
Each identified event is subsequently represented as an event table e = {(k i , v i )} l . Each attribute (k i , v i ) is a key-value pair; it describes an attribute of the event. For example, given a HeroKill event table e hk = {(Killer, player1), (Dead, player2)}, it means player1 has killed player2.

Decoupled Commentary Generation
The last step is to generate a commentary for each identified event table. MOBA-E2C decouples the module of commentary generation because game rules are frequently updated; thus, MOBA-E2C can quickly adapt to the newest version game by updating the non-training upper event handlers. Consequently, this job can be regarded as a data-to-text task, only referring to the given event table during the generation. To adapt to different scenarios, this work proposes three different generators, a rulebased MOBA-RC, a generative MOBA-GPT, and a fused MOBA-FuseGPT.

MOBA-RC
Rule-based methods have been widely used in building machine text generators because 1) they are easy to develop and do not require training; 2) they can generate accurate commentaries with predefined rules; 3) they can quickly adapt to changes in the upper logic (i.e., the rules of the game).
Considering such advantages, we first design a rule-based generator MOBA-RC. Specifically, for each type of event  where each {{k i }} represents a placeholder and can be filled by the corresponding event table key-value attribute (k i , v i ). Subsequently, given an event table e, MOBA-RC can generate a commentary by selecting a pattern and filling the placeholders.

MOBA-GPT
To break the limitation of fixed patterns, we subsequently propose a generative model to learn to generate commentaries. Considering the shortage of supervised event-commentary training data, the proposed MOBA-GPT makes use of the pre-trained GPT2 (Radford et al., 2019) as our generative backbone model. The pre-trained model can transfer the knowledge from the large-scale unsupervised training data to the small-scale downstream applications, alleviating the data shortage (Li et al., 2021b). Similar to other common pre-trained language models (Zhang et al., 2021), GPT2 can only operate plain text. Hence, in the training, given a supervised training instance (e, Y ), where e = {(k i , v i )} l is the input event table and Y = (y 1 , · · · , y m ) is the target commentary, we first construct a plain text sequence I as an input: where T s and T e are two special symbols to indicate the start and the end of the sequence, T g and T c are another two special symbols to indicate the start of the linearized table input and the start of the commentary; θ(e[i]) linearizes the i-th key-value attribute (k i , v i ) of e to the plain text with two special symbols T Key and T V alue : Afterward, the training (fine-tuning) objective can be formulated as minimizing the following neg-ative log-likelihood: where t Y corresponds to the start position of Y .

MOBA-FuseGPT
Rule-based MOBA-RC generates commentaries based on the predefined rules; hence, it can accurately generate commentaries, and quickly adapt to new types of events by adding rules. However, limited by the fixed patterns, generated commentaries lack enough diversity. On the other hand, generative MOBA-RC is no longer limited by the fixed patterns, but the training requires a large amount of supervision data, which is a thorny challenge in the context of generating game commentaries. Although using pre-trained language models can alleviate this issue more or less, the domain/task-specific knowledge is still in shortage. Consequently, both methods are still not satisfactory enough.
As illustrated in Figure 3, to reduce the impact of the shortage of supervised data and improve the performance in the few-shot scenarios, MOBA-FuseGPT further augments the MOBA-GPT by infusing the power of the rule-based MOBA-RC.
Adaptive Training Although a pre-trained GPT2 model can transform the implicit knowledge learned from the large-scale unsupervised data to the game commentary generation, there is still a gap because the task and the domain are totally different. Consequently, we propose to conduct pseudo-supervised adaptive training before the fine-tuning. Given a set of game sessions, we first identify the corresponding event tables and then use the rule-based MOBA-RC to generate a set of commentaries. Thus, we can obtain a set of event-commentary pairs {(e, Y ′ )}. Such event-commentary pairs can be regarded as pseudosupervised training data to adjust the GPT2 before fine-tuning on the human-annotated {(e, Y )}.
Prototype-Augmented Generation Generating texts with prototypes has shown great potential in text generation . Inspired by them, we propose a prototype-augmented generation by regarding the commentary generated by MOBA-RC as the prototype commentary. Specifically, given an event table e, we first adopt the MOBA-RC to generate a commentary Y ′ ; then, we can construct a new event table: where P rototype is a special key to indicate the usage of Y ′ . Thus, by using the augmented {(e p , Y )}, the model can be more effective in generating the target Y by referring to the Y ′ .

Experiment
This paper takes Dota2, one of the most popular MOBA games, as our case. This section describes the construction of the Dota2 implementation of MOBA-E2C and the dataset Dota2-Commentary.

Dota2 Event Handlers
We implement a MOBA-E2C for Dota2. Thus, as listed in Appendix A, we have designed 34 event handlers for capturing different Dota2 highlight events, where 8 of them are regarded as the zeroshot event types in the dataset partition.

Supervised Instances
We employed two annotators to generate highquality event-commentary data. Two annotators have 3+ years of gaming experience. The annotation process can be summarized as follows: 1. We designed a tool for the annotation (see Appendix B). Annotators were required to write a commentary based on the given event table and the corresponding game replay video.

2.
To ensure the quality of human commentaries, the tool will automatically check the submitted commentary. If a submission is too similar to the existing commentaries, this submission will be rejected and the corresponding volunteer is required to make a new commentary.
The whole process lasted about one month, and we finally obtained 7,473 human commentaries.

Dataset Partition
We divided the selected 70 Dota2 game sessions into two parts: Seen contains 60 sessions and Unseen contains 10 sessions. Similarly, the event types have been decided into two parts: Default contains events generated by 26 types of event handlers and ZeroShot contains events generated by the remaining 8 event handlers.
As reported in Table 2, the model is trained/validated on the Seen+Default data. In the test, to deeply demonstrate the performance in different scenarios, there are four different test sets, namely, Seen+Default, Seen+ZeroShot, Un-seen+Default, Unseen+ZeroShot.

Pseudo-Supervised Instances
For each session, we employ the rule-based MOBA-RC to generate commentaries and then construct pseudo-supervised training instances. MOBA-RC generates a commentary by randomly selecting a pattern; thus, to improve the number of pseudosupervised instances, we repeat the generation three times (selected from 1,3,5) for each session.

Comparison Models
We first evaluated the following data-to-text models, which generate commentaries by using the linearized event tables : 1) S2S: Seq2Seq model (Sutskever et al., 2014) has been widely used in the field of text-generation. In this implementation, the encoder is a 2-layer 768d bi-GRU network, the decoder is another 2-layer 768d GRU  network. The tokenization and vocab use the solution of the following BERT. 2) BERT: Based on S2S, it replaces the encoder to a pre-trained Chinese BERT encoder hfl/chinese-bert-wwm-ext (102M parameters, 768d, 12L, 8H, 21,128 subwords (Cui et al., 2021). 3) MOBA-RC: The rule-based Dota2 commentary generator proposed by this paper. 4) MOBA-GPT: The Chinese MOBA GPT2 pretrained/trained by us. 5) MOBA-FuseGPT: The proposed method. Besides, we also evaluated several re-writing models. Specifically, given an event table e, we first employ the rule-based MOBA-RC to generate a commentary Y ′ , and then force the model to learn to generate ground-truth commentary Y , i.e., P (Y |Y ′ ). Similar to the data-to-text models, the re-writing models include S2S+RW, BERT+RW, and GPT2+BW. Implementation Codes were implemented by PyTorch, and the hugging-face transformer 5 . In the (fine-tuning) training, the batch size is set to 32, GPT2 and BERT use AdamW optimizer and 1e-5 learning rate, and other modules use Adam optimizer and 1e-4 learning rate. In the inference stage, we adopt greedy decoding to generate commentaries. Such codes run on a NVIDIA-RTX2080Ti/3090.
The Pre-training of GPT2 We find the resources of general-purpose small-size Chinese GPT2 models are rare. Thus, we pre-trained a Chinese GPT2 with two NVIDIA-RTX3090 by ourselves. In detail, the GPT2 configuration is 768d,12L, and 12H. The vocabulary includes 30K subwords and 200 special symbols, and the maximum length is 1,024. We trained this GPT2 on a Chinese corpus, which includes 113M utterances/5.22B tokens. The batch size is 512, the optimizer is AdamW, the learning rate is 1.5e-4, 4000 warm-up steps, and 640,000 training steps.
Adaptive Training MOBA-FuseGPT has an additional adaptive learning process. We find if we keep using 32 batch size and 1e-5 learning rate, the 5 https://huggingface.co/ model tends to be over-fitted. Thus, compared to the fine-tuning, the batch size is increased to 1,024, and the learning rate is also increased to 1e-4.

Metrics
The evaluations were conducted at the character level because models use different tokenization solutions. We used the F1 (Unigram-F1), RG (Rouge-L) (Lin, 2004), BLEU (BLEU-4) (Papineni et al., 2002) to evaluate the character-overlapping relevance; we also use the embedding-based EM-A (Embedding-Average) and EM-X (Embedding-Extreme) to the semantic relevance (Liu et al., 2016). To evaluate the diversity and the informativeness, following (Zhang et al., 2020), we report the D2 (Distinct-2) and the 4-gram entropy Ent.

Evaluation of rules and event handlers
Unacceptable Neutral Acceptable Proportion 4.0% 23.5% 72.5% Table 3: The evaluation of rules and event handlers.
The rule-based MOBA-RC plays an important role in the MOBA-E2C. It is necessary to check the correctness of the pre-defined rules and patterns. Therefore, we employed two volunteers to validate the effectiveness of pre-defined rules and patterns. We used MOBA-RC to generate commentaries for 6 different Dota2 game sessions; then, we sampled 100 cases from the generated commentaries and employed humans to annotate. As reported in Table 3, commentaries generated by MOBA-RC are highly usable, only 4% are unacceptable. This demonstrates rules and patterns are well-defined, which can conduct the job accurately.

Automatic Results
We have reported results in Table 4. By comparing the geomean scores of models, we can see that MOBA-FuseGPT has achieved the best overall performance in every group, demonstrat-  ing the effectiveness of our approach. In data-totext models, the naive S2S learns to generate commentaries from scratch, and thus it has the weakest performance in every group. After introducing the pre-trained language model, the advanced BERT/MOBA-GPT has significantly better performance. Moving to the rule-based MOBA-RC, we can find the results are quite interesting. The performance in the zero-shot test sets (Seen/Unseen-ZS) is more powerful than in the normal test sets (Seen/Unseen). For example, MOBA-RC is worse than BERT in normal test sets, but is significantly better than BERT in zero-shot test sets. It indicates the necessity of using rule-based methods in real scenarios because of the adaption ability to the new requirements. Based on the MOBA-RC, we also evaluated several re-writing models. We can find such re-writing models work well only if the data distribution is similar in both the training stage and the test stage. The proposed MOBA-FuseGPT combines the advantages of all previous methods, and thus undoubtedly has the best overall performance.
The performance of the rule-based MOBA-RC will not be affected by the test set, and we can regard it as a constant baseline. Intuitively, if the performance relative to MOBA-RC is greater than 1.0 (i.e. Ratio), we can say this model is better than the MOBA-RC. In four test groups, only our MOBA-GPT and MOBA-FuseGPT can satisfy this.  Figure 4: The geomean score in few-shot scenarios. The detail can be find in Appendix C

Ablation Study in Few-Shot Scenarios
Generative methods have great potential in developing machine commentators, but they also require supervised training data. To reduce the burden of collecting supervised data, besides using the pre-trained language model, MOBA-FuseGPT infuses the power of the rule-based method with the Prototype-Augmented Generation and the Adaptive Learning.
To evaluate them, we tested ablated models in few-shot scenarios. As shown in Figure 7, both two techniques are effective. Even only using 1 8 training data (about 533 instances), the geomean score of MOBA-FuseGPT still outperforms/on pair with the baselines trained on full data (see Table 4). It means our approach can be quickly migrated to other MOBA games.

Human
夜魇已经失守了3个建筑，但却仅仅拿下了天辉的1个建筑,双方差距在持续扩大中。Dire has lost three buildings, but only destroyed one Radiant's building; the gap between the two teams continues to widen.

Case Study
Two cases are shown in Table 5. In the first case, the commentary generated by MOBA-RC is correct but not attractive. MOBA-MOBA tried to make the commentary more attractive, but it generated some wired words. Undoubtedly, the commentary generated by MOBA-FuseGPT is not only correct but also more attractive than the human-generated commentary. The second case is sampled from the zero-shot Unseen-ZS. The situation of MOBA-RC is similar to the last case because it only requires pre-defined rules. Both two generative MOBA-GPT and MOBA-FuseGPT outputted acceptable commentaries, demonstrating the applicability in the zero-shot scenario. However, both are not as informative as human-generated commentary.

Related Work
The development of e-sports has greatly enriched people's amateur life and brought great commercial value. Subsequently, many AI-based tools/models have been developed for e-sports to meet the increasingly growing demands. The most popular applications include: 1) predicting the outcome/death (Yu et al., 2018;Akhmedov and Phan, 2021;Wang et al., 2018;Qi et al., 2018;Katona et al., 2019); 2) identifying the player/role (Yuen et al., 2020;; 3) recommending items/heroes/characters (Looi et al., 2019;Porokhnenko et al., 2019;Aznin et al., 2019); 4) AI Players (OpenAI, 2019); and many others (Ponomarenko and Sirotkin, 2020;Marchenko and Suschevskiy, 2018;Block et al., 2018). Compared to such works, we study the machine commentary generation, which not only involves the knowledge of e-sports but also the NLP techniques.
Text generation is an important task in both the academic and industry (Li et al., 2021b). Generally, it learns to generate texts based on the given data (Li et al., 2021a), such as generating biology based on infobox (Bai et al., 2020), generating descriptions , and many others. Only recently, there is a work to generate commentaries for traditional sports games (Ishigaki et al., 2021). However, no previous work has tried to generate MOBA commentary because e-sports games are pretty complex, and it is not easy to collect supervised training data. Compared to such works, this work focuses on a more specific task, namely, generating game commentaries.

Conclusion
This paper proposes constructing machine game commentators that can work at any time and place. This paper focuses on the MOBA games and proposes a novel data-driven commentary generation framework MOBA-E2C. Instead of using visual features, MOBA-E2C generates commentaries by using the meta-data of a game. MOBA-E2C regards each game session as a live-streaming sequence of game states, and then employs several event handlers to capture events. Subsequently, we use MOBA-FuseGPT to generate commentaries based on the identified events. It infuses the advantages of rule-based methods and the generative pre-trained language model. In the experiments, we take Dota2 as the case study and construct a dataset Dota2-Commentary. Extensive experiments have demonstrated the effectiveness of our approach.
Limitation and Future Work This work focuses on exploring the research field of MOBA game machine generators and setting the baseline; thus, we propose a data-driven framework MOBA-E2C, a generator MOBA-FuseGPT, and a dataset Dota2-Commentary. However, it can be easily found that this work did not try to use a new network architecture. In the future, based on the foundation of this work, we will continue to investigate the potential of MOBA game machine generator and brings more brilliant models.

Ethical Considerations
This work constructed a new dataset by employing two human annotators. Such two annotators are employees in a commercial company (GamesMind Technology), and thus they have paid with the corresponding salaries. The salary level is higher than the local minimum wage.
From the perspective of the technique, we have filtered out irrational rules and data; thus, the commentaries generated by our approach have no ethical issues. Meanwhile, compared to human commentators (streamers), our machine commentator can better avoid the ethical issue.

A Event Handlers
This work selects Dota2 as the case to evaluate the proposed MOBA-E2C. Thus, the first job is to design a set of event handlers to capture highlight event tables for Dota2.
As listed in Table 6, we have designed 34 event handlers for capturing different highlight events for Dota2, where 8 of them are regarded as the zero-shot event types in the dataset partition.

B Annotation Tool
As shown in Figure 5, we have designed a visual tool for the annotation.
It is worth noting that volunteers are required to generate a commentary based on the given event table and the corresponding game replay video at the same time.
To ensure the quality of human commentaries, the tool will automatically check the submitted commentary. If a submission is too similar to the existing commentaries, this submission will be rejected and the corresponding volunteer is required to make a new commentary.

C Ablation Study
Although generative methods have more potential in developing machine commentary generation, they also require supervised training data. Hence, MOBA-FuseGPT infuses the power of rule-based method with the Prototype-Augmented Generation and the Adaptive Learning. We test ablated models in few-shot scenarios. As illustrated in Table 7, both two techniques are effective. Meanwhile, even only using 1 8 training data (about 533 instances), the geomean score of MOBA-FuseGPT still outperforms/on pair with baselines. It means our approach can be quickly migrated to other MOBA games.   Figure 5: The adopted visual annotation tool.