Event Extraction from Historical Texts: A New Dataset for Black Rebellions

Understanding historical events is necessary for the study of contemporary society, culture, and politics. In this work, we focus on the event extraction task (EE) to detect event trigger words and their arguments in a novel domain of historical texts. In particular, we introduce a new EE dataset for a corpus of nineteenth-century African American newspapers. Our goal is to study the discourse of slave and non-slave African diaspora rebellions published in the periodical press in this period. Our dataset features 5 entity types, 12 event types, and 6 argument roles that concern slavery and black movements between the eighteenth and nineteenth centuries. Historical newspapers present many challenges for existing EE systems, including the evolution of meanings of words and the extensive use of religious discourse in newspapers from this era. Our experiments with current state-ofthe-art EE systems and BERT models demonstrate their poor performance over historical texts and call for more robust research efforts in this area.


Introduction
In the last two decades, the emergence of digital humanities has transformed scholarship in the humanities. Historical documents are now massively digitized into photos and texts that allow researchers to query across collections and languages (Piotrowski, 2012). Despite the convenience of these applications (Yang and Eisenstein, 2016), a gap still exists between datasets and research methods. As such, humanities scholars do not solely interpret historical facts from statistical figures derived from massive data. Rather, they prefer reading texts and interpreting words in historical and cultural context, or by associating texts with the circumstances surrounding their publication. This working methodology requires an emphasis on the quality of the data over the quantity of the data. Recent advances of natural language processing (NLP) aim to bridge the gap between qualitative and quantitative analyses by identifying, extracting, and counting contextual data (Won et al., 2018;Wadden et al., 2019;Lin et al., 2020). This new approach provides contextual information about real-life entities (e.g., individuals, locations, times, documents) which can be later integrated into knowledge bases (Won et al., 2018) to aid historical research and discourse analysis.
In this work, we explore Information Extraction (IE) in NLP for humanities research in support of the important and complicated process of knowledge extraction from historical texts. Particularly, we investigate the Event Extraction (EE) task which identifies event trigger words of pre-determined event types (the most important words/phrases to evoke events) (Li et al., 2013), together with its arguments (e.g., participants, locations). For example, in the following sentence an EE system should be able to detect the word "proclaimed" as a trigger word of the event type "Law Approve" and associate it with the arguments, i.e., agent (Capitol), beneficiary (the slave), and datetime (now).
Freedom to the slave should now be proclaimed from the Capitol, and should be seen above the smoke and fire of every battle field.
To enable the development and evaluation of EE models for historical text, benchmark datasets play an important role. However, most of the current datasets in EE (i.e., ACE-2005(Walker et al., 2005 and TAC KBP (Mitamura et al., 2015)) are not suitable for this domain for several reasons. First, these datasets are collected from various sources without a target topic (Walker et al., 2005;Mitamura et al., 2015). Therefore, tracking the evolution of some specific movements or progress, which is of great interest to literary scholars and historians, is not a feasible goal. Second, documents in these datasets are derived from recent articles and documents in which the use of words in the text differ from their uses in the past. For example, some words obtain new semantics over time, and the dominance of religion in the past led to extensive use of religionrelated words and figurative language in historical publications. Last but not least, existing EE datasets mostly concern events in common human life, such as giving birth, transportation, and crimes. These events might not relate to the subjects literary scholars and historians want to study.
To redress this problem, we introduce a novel EE dataset for historical texts, called BRAD, focusing on Black Rebellions in African Diaspora (i.e., African American population). BRAD's documents are selected by a humanities expert and are annotated by EE experts for 5 entity types, 12 event types, and 6 argument roles. Finally, we evaluate the state-of-the-art EE models on BRAD. Our experiments show that the performance for historical texts of current EE models is significantly poorer than those for modern texts, necessitating further research into this area. We will also release our dataset and code to facilitate future research.

Data Collection and Annotation
In this project we use documents from the African American newspaper corpus. These documents involve news articles derived from nineteenthcentury African American periodicals 1 published from 1827 to 1909.
To create an EE dataset, we first designed a set of event types and annotation guidelines, consulting our humanities expert who specializes in nineteenth-century literature. In particular, we focus on the four most important events for Black rebellions presented in our corpus, including Humanity: a humanity event concerns a violation or facilitation of basic human rights (e.g. living, freedom, property); Law: a law event characterizes an introduction, approval or appeal of a law; Conflict: a conflict event represents an act of violence; it includes the initialization, development, and consequences of a violent act; and Justice: a justice event captures an act of punishment of the government to the people who violate a law. These four events are further expanded into 12 event sub-types. Tables 7 and 8 present event types along with their  descriptions and examples in BRAD. To capture arguments for such events, we introduce five entity types (i.e., Person, Organization, Geographical-Political Entities, Time, and Document). The first four entity types follow the definition in the ACE 2005 guideline (Walker et al., 2005) while the Document type represents government documents (e.g., Slavery Act) used in events. Finally, we define six argument roles that such entity types can play in our events, including Time, Location, Agent, Patient, Object, and Beneficiary. Tables 9 and 10 provide more descriptions and examples of these argument roles for each event type.
The African American corpus is a large corpus of 177,582 articles. We thus select documents that are relevant to our focused topic of Black diaspora rebellions. First, automatic selection is done by keyword matching to identify documents related to slavery and insurrection. As such, our humanities expert defined a set of keywords for the topic of rebellion. In the nineteenth century this cluster of words were used interchangeably to describe African diaspora rebellion events (e.g., "rebel", "revolt", "strike", "insurrection"). We used the Stanford CoreNLP toolkit to split and tokenize documents into sentences and words. Next, for each document in the corpus, we counted the number of words in the document that appears in the designated keyword set (called matching rate). The top 1000 documents with the highest matching rates are selected for further consideration. In the second step, the humanities expert examined the 1000 documents to identify relevant documents for Black rebellions, leading to the selection of 151 documents used for the EE annotation.
In the next step we recruited two graduate students to annotate the selected documents for EE. Each student was independently trained on the annotation guideline and performed a group of exercises to better recognize events and entities. The students annotated the 151 documents for entity mentions and event triggers, achieving Cohen's Kappa scores of 0.81 and 0.82 respectively. Note that these scores are very close to the near-perfect agreement range of [0.81, 0.99]. To further improve the quality of the dataset, our humanities expert will resolve the annotation conflicts that arise between the two students, leading to the final annotation version of entity mentions and event triggers in the 151 documents. In the next step, given the reconciled entity mention and event trigger annotation, the two students continue to annotate event arguments for the event triggers. Our evaluation shows a Cohen's Kappa score of 0.75 that indicates a strong agreement between the two annotators. Also, the lower agreement score for event arguments suggests that event argument annotation is more ambiguous than those for entity mentions and event triggers. Finally, our domain expert was consulted to resolve any conflicts in event argument annotation, producing the final version of our BRAD dataset with the 151 documents. To facilitate the development of EE models, we then split BRAD into three portions for training, development, and test data with 101, 25, and 25 documents, respectively. Table 1 presents the statistics while Table 2 and 3 presents the frequencies of event and entity types in our BRAD dataset.   Annotation Challenges: During the EE annotation process of historical texts, we found several noteworthy challenges regarding the ability to achieve interpretive consensus of the texts. First, for the domain expertise, we find that the use and meaning of words evolves over time and across geographical regions, potentially introducing new meanings or making one meaning more popular than the others. Language is always in perpetual flux. As such, understanding texts from the past requires analysis of the context in which texts were written. In order to be effective, the annotations must be attentive to these contexts. For ex-  ample, in the following sentence, "Congress" and "her" are two mentions of the USS Congress battleship launched by the United State Navy in 1841. Without historical knowledge, our current perception might interpret "Congress" as the legislative branch of the United States. In fact, the second clause mentions the wooden hull that helps to clarify it as the battleship that sunk in 1862 during the US Civil War. Such misinterpretation might lead to incorrect annotations and analyses. "The Congress was visited and received the shots and shells in all part of her wooden hull".
Second, we find that annotation disagreements are more likely to occur in the interpretation of event triggers. In BRAD, we allow event triggers to involve multiple words that cause span mismatches between annotations for some confusing cases (e.g., annotating the whole phrase "make the black man equal" as an event trigger or annotating "make" and "equal" as two separate triggers). Another form of popular disagreement involves mismatches on event types. Consider the following sentence as an example: "Believing his life to be in danger, Patmon stepped back, drew his revolver, and told the fellow to surrender, or he would shoot him ." Two annotators agree that the word "shoot" is an event trigger. However, one annotator considers this as an event of type Conflict Attack as it is a part of the conflict between the overseer ("Patmon") and the slave ("fellow", "him"); the other annotator, on the other hand, treats "shoot" as a Humanity Deprive event as the overseer is threatening to kill the slave (i.e., taking the right to life). Data Analysis: To illustrate the ambiguity in BRAD, Table 4 shows five words with the highest frequency as event triggers (i.e., Event Count), along with the percentage of times these words are labeled as event triggers in the dataset (i.e., Event Rate) (Sims et al., 2019). This table demonstrates the likelihood that words with the highest event counts might not be annotated as event triggers in BRAD, thereby necessitating EE models to find a   method of effectively capturing context in order to perform correct predictions. Moreover, we find extensive use of religionrelated words in BRAD compared to existing EE datasets. For example, considering the words "lord", "heaven", and "christian", the percentages of documents in ACE 2005 containing these words are only 0.3%, 0.7%, and 1.7% while those percentages for BRAD are 8.7%, 8.7%, and 18.3% respectively. Such language difference suggests the potential need to adapt existing language models to better capture the nature of historical texts which, in turn, will facilitate a more accurate performance of EE.

Experiment
There are three major EE tasks that BRAD supports for historical texts, including entity mention detection (EMD), event trigger detection (ED), and event argument extraction (EAE). This section aims to reveal the complexity of the EE tasks in BRAD by evaluating the performance of existing state-of-theart models for EE on this dataset. In particular, we focus on the following state-of-the-art models for EE that leverage the pre-trained language model BERT (Devlin et al., 2019) for the text encoding and jointly perform predictions for all EE tasks in an end-to-end fashion (i.e., joint inference): DyGIE++ (Wadden et al., 2019): This model utilizes dynamic span graphs to exploit long-range cross-sentence relationships for span representation propagation for joint IE.
OneIE (Lin et al., 2020): This model first identifies spans of entity mentions and event triggers. The detected spans are then paired to jointly predict entity types, event types, relations, and argument roles for IE. Global features are used to capture cross-task and cross-instance dependencies and are employed in the decoding phase with beam searches to improve extraction performance.
As such, we adapt the official implementations of such models from their original papers for our EE task in BRAD by ignoring the relation extraction task and re-tuning them on the BRAD development set. For both models, we employ the pretrained BERT model (i.e., the bert-base-cased version) to encode input texts. Besides, motivated by the language difference between historical and modern texts, we further explore a variant of the BERT model by fine-tuning it on the African American corpus via the masked language modeling task (Devlin et al., 2019). Note that we exclude the 151 documents of BRAD in this fine-tuning process. This fine-tuned BERT model will also be fed into DyGIE++ and OneIE to perform EE in BRAD.
Result: Table 6 reports the performance of the models on the test set of BRAD over five subtasks: Entity Mention Detection (Entity), Event Trigger Identification, i.e., not concerning event types (Trig-I), Event Trigger Classification (Trig-C), Event Argument Identification, i.e., not concerning argument roles (Arg-I), and Event Argument Classification (Arg-C). For comparison, we also include the original performance of the models on the popular EE dataset ACE 2005. There are three major observations from the table. First, the performance of current EE models on BRAD is significantly and substantially worse than those on ACE across different tasks. It thus suggests that EE for historical texts in BRAD is a challenging task and more research effort is necessary to boost the EE performance for this domain. Second, comparing the performance of the models with different versions of BERT (i.e., original vs fine-tuned), it is clear that fine-tuning BERT on historical texts is beneficial for improving the performance of EE models on BRAD (especially for OneIE where the improvement is consistent across different EE subtasks with large margins). This observation suggests that pre-training BERT on modern texts is unable to capture the nuance of language use in   (Ahn, 2006;Li et al., 2013;Nguyen and Grishman, 2015;Chen et al., 2015;Nguyen et al., 2016;Yang et al., 2019;Wadden et al., 2019;Lai et al., 2020c;Nguyen et al., 2021). Some recent studies in EE have also addressed extensible learning settings for EE to new event types, e.g. zero-shot learning (Huang et al., 2018), fewshot learning (Lai et al., 2020a,b), or new domains (Naik and Rosé, 2020). The closet works to ours involve recent efforts to create new datasets for EE (Satyapanich et al., 2020;Ebner et al., 2020;Wang et al., 2020;Trong et al., 2020;Le and Nguyen, 2021). However, these works do not consider historical texts as we do.

Conclusion
We present BRAD, a new dataset for EE on historical texts that focuses on Black rebellions in the American Africa corpus. Our experiments demonstrate the poor performance of current models for EE on BRAD compared to those on modern texts, thus creating room for future research on EE for historical texts. We also illustrate one approach to improve current EE systems for historical texts via fine-tuning existing pre-trained language models. In the future, we plan to enlarge our datasets with more annotated documents and event types.

Type Description Examples LAW Propose
A PROPOSE event occurs when an actor (Agent) introduces a bill, proposition, or treaty which benefits a group of people (Beneficial).
Below we give the salient points of the bill of an entertainment recently given in the interest of a certain church about to be organized in a certain town in New Jersey. The bill introduced in Congress last week by the congressman from North Carolina , to abolish the 15th amendment. It 's only effect will be to create support for the bill of congressman Crumpacker which proposes a reduction of representation in those States LAW Approve An APPROVE event occurs when a bill or order (Object) is passed by either the head of the government or a representative committee (Agent).
Be it enacted by the General Assembly of Maryland.
Vermont has passed her Liberty Bill , New York has under discussion , and Massachusetts will soon report and pass her Act . But it is said that for the Government to adopt the abolition policy, would involve the loss of the support of the Union men of the Border Slave States. LAW Repeal A REPEAL event occurs when an active law (Object) is completely repealed by a state actor (Agent).
... that I determined to revoke the act of the Federal Constituent Assembly , whereby Slavery was abolished. Even the New York Tribune protests against making this war for the destruction of slavery, and insists that such a war would alienate a large body of the Northern people at present who adhere to the Government in the prosecution of the war. They want to se the Government march a powerful array into the traitorous States, proclaim liberty to every slave, and wipe out the last vestige of that barbarous system from the land ...

CONFLICT Protest
A PROTEST event occurs when people (Agent) come into a public area to demand some action.
PROTEST events include, but are not limited to, protests, sit-ins, and riots as the result of a previous protest.
Almost simultaneously with the appearance of the minstrels there arose from every kennel in the neighborhood timely protest barked forth vigorously by a hundred curs, who, in common with their masters, cursed their common luck. It has attempted to supplant Government with anarchy, and the fury of a brutal mob for the beneficent operation of law, and the legally appointed law-makers. ...while the majority of the men were absent at a public demonstration at Myrtle-avenue Park, in another part of the city. CONFLICT Attack An ATTACK event occurs when a person or a organization (Agent) performs an violent act causing harm or damage to another person or organization (Patient).
Make the slave first, midst, and last Follow no longer the partial and side issues; strike for the abolition of slavery. . . . and hovering about Williamsport in an unaccountable manner -while the rebel troops are burning , destroying , pressing loyal men into service , or driving them from the houses they hoped to possess , and the wheat-fields they expected to reap , under the protecting folds of the Federal flag . The States which rebelled , after having been most thoroughly whipped in a great war , came back into the Union upon their promises to abide by the Constitution and Laws of the same . CONFLICT Injure-Die A Injure-Die event is defined as a death or wound of a person (Patient) which is the result of a violence event by another person (Agent).
The life of loyal men are being sacrificed by scores, and will, by and by, be sacrificed by thousands. Why should the nation pour out its blood and lavish its treasure by the million, consent to protect and preserve the guilty cause of all its troubles? Our loss is estimated at two hundred killed and wounded.

Type
Description Examples CONFLICT Other Conflict Other events are reserved for events that are related to conflicts, but not classified as one of the conflict event types above, including declaring war, threatening someone, forming an army, a movement, and a march.
Let the slaves and free colored people be called into service, and formed into a liberating army, to march into the South ... Their efforts in this direction have been crowned by entire success. He had called loud and earnestly upon the Government for reinforcements; but the Government was practically deaf to the call, and left him and his brave companions either to perform a miracle, or to be completely overwhelmed by superior numbers. JUSTICE Arrest-Jail An Arrest-Jail Event occurs when the movement of a person (Patient) is constrained by a state actor (Agent).
It appears that he obtained his information direct from German where a supposed agent of the company had been arrested, having in his possession incriminating documents. It is said to be possible to imprison a man for debt in Massachusetts. He put her in jail at Eastville and she stayed there for some time .

JUSTICE Sentence
A SENTENCE Event takes place when a punishment for a person or an organization (Patient) is issued by a state actor (Agent).
... and any person so offending shall be guilty of a felony, and shall, on conviction, be sentenced to confinement in the penitentiary of this State, for a period not less than ten nor more than twenty years from the time of sentence pronounced on such person. If any slave or servant be convicted or any crime the punishment whereof may be death or confinement in the penitentiary ..., but that it has been promptly put down and the guilty parties summarily punished.

JUSTICE Execute
An EXECUTE Event occurs when the life of a person (Patient) is taken by a state actor (Agent).
Hector Grant James Horney, and Esther Anderson, white servants, were executed at Chester, Kent county. All these , if the demand of the Administration and its friends is gratified , are to be hanged ; for the punishment of treason by our law is death , ... He made some confessions, and managed finally to escape, but was arrested, taken to El Dorado, tried, and shot -not, however, by regular process. HUMANITY Deprive An DEPRIVE Event occurs when someone's right (Patient) is taken away, disrespect, or discouraged in any form of expression including but not limited to law, action, and statement.
We thank Dr. CROFTS for the assurance of his sympathy, and hope often to receive his earnest words in behalf of our enslaved people. Before the slaved is freed, this and a hundred other plans will be critically canvassed , and the discussion of each will elicit some truth . ... shall the four millions slaves , now robbed of all their rights, and degraded to a level with brute beast...

HUMANITY Endow
An ENDOW Event occurs when someone's right is enriched or appreciated in any form of expression including but not limited to law, action, action and statement.
And as for lynching -let all the officers of the law, with all the powers of the law, defend the rights and life of every prisoner. Before the slaved is freed , this and a hundred other plans will be critically canvassed They are going into every community which offers freedom and protection to their citizens , where law is justly administered and where the rights of man are respected ; and there are many such sections in this country ; there will be the future homes of the Negroes . The person or organization who attempts to attack or kill They cut men in half, and pieces from exploded shells, killed and wounded several .
Most of the negroes , we regret to hear, are said to have been massacred.

Patient PER ORG
The person or organization who is injured or dead Time TIME When the injury/death takes place Location GPE Where the injury/death takes place