MILIE: Modular & Iterative Multilingual Open Information Extraction

Open Information Extraction (OpenIE) is the task of extracting (subject, predicate, object) triples from natural language sentences. Current OpenIE systems extract all triple slots independently. In contrast, we explore the hypothesis that it may be beneficial to extract triple slots iteratively: first extract easy slots, followed by the difficult ones by conditioning on the easy slots, and therefore achieve a better overall extraction.Based on this hypothesis, we propose a neural OpenIE system, MILIE, that operates in an iterative fashion. Due to the iterative nature, the system is also modularit is possible to seamlessly integrate rule based extraction systems with a neural end-to-end system, thereby allowing rule based systems to supply extraction slots which MILIE can leverage for extracting the remaining slots. We confirm our hypothesis empirically: MILIE outperforms SOTA systems on multiple languages ranging from Chinese to Arabic. Additionally, we are the first to provide an OpenIE test dataset for Arabic and Galician.

case of n-ary extractions, where more than 3 slots 126 need to be extracted (Section 2.5).

127
Figure 1: MILIE system architecture. An input sequence is is tokenized and, optionally, dependency parsed. This is given to a BERT-based transformer, which outputs a hidden state for each token. The hidden states are given to each of the extraction heads, here to the predicate head. This head marks the location of the predicate in the sequence. The system then proceeds to extract the other slots, see Figure 2.

128
To implement the iterative nature of our system,

188
Fixing the n-ary argument extraction in the fi-189 nal iteration we obtain the following six decoding 190 pathways-P spoa , P sopa , P psoa , P posa , P ospa , P opsa .

191
Let's assume the decoding pathway P psoa : pred-192 icates are extracted first, then for each predicate, 193 subjects are extracted, then for each (predicate, sub-194

304
We evaluate MILIE on both n-ary as well as binary 305 triple extraction datasets. One simple way to con-306 vert the n-ary extractions to binary extraction is to 307 ignore the n-ary arguments. However, this will lead 308 to a decrease in recall because the n-ary arguments 309 may not be part of other extracted triples due to 310 the initial n-ary extraction. Another method is to 311 treat the extracted n-ary arguments as objects to The Taj Mahal was built by Shah Jahan in 1643 Predicate built by The Taj Mahal was <P>built by<P> Shah Jahan in 1643 The <S>Taj Mahal<S> was <P>built by<P> Shah Jahan in 1643 Object Shah Jahan The <S>Taj Mahal<S> was <P>built by<P> <O>Shah Jahan<O> in 1643 Argument in 1643 The <S>Taj Mahal<S> was built by <O>Shah Jahan<O> in 1643.

Predicate built by
The Taj Mahal was built by <O>Shah Jahan<O> in 1643.
The Taj Mahal was built by Shah Jahan in 1643. Object Shah Jahan The stock pot should be chilled and the solid lump of dripping which settles when chilled should be scraped clean and re-chilled for future use.
However, StatesWest isn't abandoning its pursuit of the much-larger Mesa.
<tan -Mesa grande>: syntactically and semantically incorrect. The rest of the group reach a small shop, where Brady attempts to phone the Sheriff, but the crocodile breaks through a wall and devours Annabelle.
El resto del grupo llega a una pequeña tienda, donde Brady intentos de teléfono del Sheriff, pero los saltos de cocodrilo a través de una pared, y devora a Annabelle.
"intentos": number and the gender don't match with the noun. "de teléfono del Sheriff": telefóno cannot be used as a verb. "los saltos de cocodrilo a través de una pared": semantically incorrect.

513
Results shown in Table 7 suggest that linguistic 514 complexity of objects is higher than those of predi-515 cates and subjects.

516
This is also confirmed in Fig. 3 Figure 4: Distribution of the Part of Speech tags in subject, predicates and object tokens of triples in BenchIE English test data.