FabKG: A Knowledge graph of Manufacturing Science domain utilizing structured and unconventional unstructured knowledge source

As the demands for large-scale information processing have grown, knowledge graph-based approaches have gained prominence for representing general and domain knowledge. The development of such general representations is essential, particularly in domains such as manufacturing which intelligent processes and adaptive education can enhance. Despite the continuous accumulation of text in these domains, the lack of structured data has created information extraction and knowledge transfer barriers. In this paper, we report on work towards developing robust knowledge graphs based upon entity and relation data for both commercial and educational uses. To create the FabKG (Manufacturing knowledge graph), we have utilized textbook index words, research paper keywords, FabNER (manufacturing NER), to extract a sub knowledge base contained within Wikidata. Moreover, we propose a novel crowdsourcing method for KG creation by leveraging student notes, which contain invaluable information but are not captured as meaningful information, excluding their use in personal preparation for learning and written exams. We have created a knowledge graph containing 65000+ triples using all data sources. We have also shown the use case of domain-specific question answering and expression/formula-based question answering for educational purposes.


Introduction
In recent years, the advancement of artificial intelligence applications has grown multifold.Many areas such as natural language processing, digital twins (Liu et al., 2021), and chatbots (Chen et al., 2021) have become very popular for their ability to record and use information from unstructured sources efficiently.One such application is Knowledge Graph (KG), which has gained popularity in various domains due to its potential applications.A Knowledge Graph is a data graph meant to ac-cumulate and impart real-world knowledge, with nodes representing entities of interest and edges representing potentially diverse relations between the entities.A KG has varied applications in recommendations, search, question answering and many more.Most importantly, a KG can be used to make decisions based on inferences.
The use of a knowledge graph is of high value in making design and manufacturing-related decisions.As there has been an explosion of knowledge addition in various design considerations and manufacturing decisions, most of the knowledge is with Small and medium-sized enterprises (SMEs).The decision-making in design and production could be significantly improved using knowledge graphs (Buchgeher et al., 2021).It can benefit not only small and medium manufacturers (Li et al., 2021), but also hardware-based entrepreneurs and help boost self-sustaining product development (Li et al., 2020).
A number of prior researchers have started developing manufacturing related knowledge graphs based on specific problem areas such as machining process planning (Yang et al., 2019;Ye et al., 2018), workshop resource KG (Zhou et al., 2021;Sun and Wang, 2019), intelligent manufacturing (Yan et al., 2020), faults (Liang et al., 2022;Wang and Yang, 2019), maintenance (Hossayni et al., 2020) and industry 4.0 (Garofalo et al., 2018;Bader et al., 2020;Kraft and Eibeck, 2020).However, none of these graphs represent fundamental knowledge of manufacturing concepts, processes, process parameters, characterization, materials, applications, and various other basic aspects of manufacturing domain education.A large amount of such fragmented knowledge can be integrated to assist the learners in intuitively and easily connecting with the knowledge system by leveraging the nodes and relationships.Such knowledge integration will also assist in intelligent question answering that can accelerate knowledge discovery and search.Google bases part of its Knowledge Vault on the well-known Wikidata knowledge base (Ringler and Paulheim, 2017).Even though Wikidata has a large amount of information from Wikipedia, there is a dearth of standardized knowledge regarding many important entities related to the Manufacturing domain.For instance, the term 'additive manufacturing' is present as '3d printing'; while there have been substantial developments in the field of 'metal additive manufacturing' (metal AM) over the last decade, it is not present as a subclass of '3d printing' in Wikidata.Moreover, within metal additive manufacturing (Frazier, 2014), sub-classifications such as DMLS, EBAM, and PBF are not present in Wikidata.One reason for this is the volunteerdriven nature of Wikidata as a knowledge base; this has led to a limited amount of specialist terminology and information regarding the manufacturing domain.Therefore, Wikidata cannot provide direct answers to questions that are very specific to this domain.To understand the basic concepts in the context of manufacturing we focus on formulating answers to some basic questions such as, 'What are some precision finishing manufacturing process?', 'What are some tools for machining copper?' etc.The purpose of creating such a knowledge graph of manufacturing using Wikidata is to provide a starting point for a structured manufacturing knowledge base, which can be amalgamated with knowledge from other sources such as textbook (Rahdari et al., 2020) and research articles (Wang et al., 2020b).
To tackle the challenges in developing the knowledge graph from scratch for manufacturing science, we consider various methodologies for creating accurate graphs.We propose a merged knowledge graph that combines the existing structured Wiki-data knowledge graph with a novel semi-supervised knowledge graph extracted from textbook data.For extracting graph triples from Wikidata, as mentioned in Figure 1, we have adopted two methods for the approach: (1) Vocabulary-based and (2) Based on Unstructured text.Former includes fetching Wikidata items using a collection of manufacturing vocabulary terms through the utilization of textbook index words, keywords from research papers, and named entity recognition using FabNER (Kumar and Starly, 2021), followed by the use of DBpedia (Mendes et al., 2011) to find Wikidata items.The latter is a semi-supervised approach that utilizes students' notes, considering standard textbooks as the reference.The most significant purpose of the latter method is to make use of textbook knowledge structured by humans, thereby increasing the quality of the knowledge base.The following sections elaborate on the details of the methodology and implementation.

Manufacturing Knowledge Graph Construction 2.1 KG construction using Wikidata
Wikidata is a knowledge base maintained collaboratively by the community to represent information in machine readable format.Since no such knowledge base exists for the manufacturing domain, we decided first to extract existing Wikidata knowledge and then merge this with the knowledge contained within manufacturing textbooks.Wikidata's knowledge graph has Q and P identifiers where Q represents entities, and P represents relations (Hernández et al., 2015).Currently, Wikidata is limited to a very few relevant relations between entities when it comes to manufacturing domain specific entities.We have taken about 10 unique relations based on all P identifiers attached with relevant Q identifiers identified by us.The relations include 'Instance of', 'Subclass of', 'Use', 'Color', 'Part of', 'Uses', 'Has quality', 'Has cause', 'Has part', 'Facet of', 'Different from'.
In order to find manufacturing-specific entities in Wikidata, we used the following methods:

Entities extraction from index words of textbooks
Index words located at the end of textbooks are a list of all topics and entities provided to assist readers in finding the location of the text.These are important terms that are often overlooked but are a good collection of domain-specific entities.We utilized easily accessible 5 diverse ebooks related to manufacturing (Groover, 2020), digital manufacturing (Zhou et al., 2012), manufacturing process (El-Hofy, 2005) welding technology (Kou, 2003) and additive manufacturing (Gibson et al., 2021), and extracted the index entities mentioned at the end of the book to expand the list of relevant entities.We found about 3500 relevant entities from various books and added those to our vocabulary.

Keywords from research papers
We used 500k+ abstracts to create the corpus for manufacturing, as mentioned in FabNER.While extracting the abstracts, we accumulated the keywords mentioned in the abstract, removed duplicates, and normalized many of the words (using Levenshtein distance).There are many words written with some variation in the spelling.E.g., Landau-Ginzburg-Devonshire, Landau-Ginsburg-Devonshire, Landau-Ginsberg-Devonshire, are the same entities with variation in the way it is written in different abstract keywords by various authors.Overall, we found about 4500 relevant entities from a sample of 5000 abstracts.

Named entity recognition on unstructured text
We utilized review articles related to manufacturing to find the most frequent and diverse terms since it generally mention most of the past work and technologies developed in the succinct text.Ten full review articles (Wong and Hernandez, 2012;ElMaraghy et al., 2012;Zhu et al., 2013;Frazier, 2014;Oztemel and Gursev, 2020;Yan et al., 2018;Stuart et al., 2010;Rajurkar et al., 2017;Wang et al., 2020a;Kaur and Singh, 2019) for this part were selected, which were processed using a trained neural network model consisting of BERT (Devlin et al., 2018) and GloVe (Pennington et al., 2014) stacked embeddings through Flair framework (Akbik et al., 2019).Next, we employed BiLSTM and CRF (Consoli and Vieira, 2019) architecture to identify 12 category entities in the review articles with F-score of 83%.Overall, we found about 2000 entities from diverse review articles related to manufacturing.Using text and vocabulary of entities from all the above sources, i.e., index words, research paper keywords, and NER on review articles, we further employed two methods for finding existing Wikidata items.As depicted in Fig. 1, in the first method, we used DBpedia spotlight API to find Wikidata items associated with the unstructured text directly based on a 0.5 confidence value.In the second, we provide manufacturing vocabulary terms as the input to wptools python library to fetch Wikidata items as the output.We find all manufacturing relevant Wikidata items to extract a subgraph from Wikidata and later merge this relatively bigger knowledge graph with textbook knowledge (explained in the next section).Upon availability of some Wikidata items, we further used SPARQL-Wrapper (uses Wikidata SPARQL endpoint) and relations list (P identifiers) to fetch forward (head from the primary entity) as well as backward (tail from the primary entity) entities associated with the item.We performed the same for two linked steps forward and two linked steps backward to find most of the nodes that are connected with each other.

KG construction using Exam cheatsheet/notes for Manufacturing
We propose a novel approach for creating triples utilizing human knowledge.Qualifying exams (or course exams) are part of any doctoral degree program.In some schools, written exams are conducted for a few courses.In some specific courses, cheatsheets/concise notes are allowed for students to bring into the exam to enable the student to remember important points.In most cases, the cheatsheet (or notes) developed for the exam are useless when the exams are over.This also means that the verified knowledge written by a student to remember the essential facts is lost or left unutilized for future references.We devised a strategy for making these short, concise notes be useful input for building connected entities within FabKG.We created optional advice on cheatsheet generation for students to follow prior to the exam at our institution so that they could participate to the task of knowledge base enhancement in the Manufacturing area.Students can only write crucial details from various textbook chapters, assuming that the number of pages allowed in the exam is limited.There is a title within each chapter, followed by several subtitles, each of which contains some entities and context, which is potentially a good knowledge source.The guidelines were kept simple so that students would not have to spend much time referring to them.It mentioned the title, subtitle, and content hierarchy and a precise technique for separating them.
The following guidelines are provided for example purposes only: a) The chapter name is preserved as the top title, followed by a distinctive symbol, making it easier to distinguish between chapters.b) Within a chapter, many sub-topics are separated by another unique symbol, such as a double semi-colon ';;'.Two sub-topics are shown in figure 2, for example: (1) Defects and (2) Crystal structure c) If there is a further subtopic within a subtopic, it is separated by a symbol such as ':' followed by some relevant points.A single semicolon separates multiple subtopics.d) Explanations or additional information about any term are retained in brackets as an attribute of a relational entity.For example, displaced ion (Frenkel defect) denotes that a point defect with a displaced ion is also known as a Frenkel defect.
Use of some symbols patterns when creating the notes aided in the design of regex patterns for quickly extracting entities and their obvious relationships.We were able to extract over 1200 distinct entities, 25 unique relations, and 4200 unique triples using this method.Fig. 2 depicts the notes in their raw and structured state.The student notes in both unstructured and structured form was verified by human supervision.Indirect crowdsourcing is the crucial aspect that has made this element of the project possible.However, the intention was to use note takers' knowledge.It should be emphasized that even though some previous work has mentioned the use of notes (Denny et al., 2015) for developing a knowledge map, on a larger scale and for educational applications, this type of knowledge source has not been studied.This method might be used with little effort for any domain-specific textual material.
Despite the small number of entities/relations discovered, this method allows textbook knowledge to be converted into useable knowledge, which aids in developing a knowledge graph for educational purposes.In general, for automatic extraction of directed relation, it is often difficult to determine which entities are related to each other when more than 2 entities are present in a sentence.This is also because, on multiple occasions, no relation exists between the entities.It becomes a challenge to employ a NER and detect directed relations be-tween entities automatically which we solve by this semi-supervised method.Based on the analysis of the notes, some of the crucial relations found include: 'has', 'hasProperty ', 'uses', 'usedTo', 'usedIn', 'causes', 'producedBy', 'makes', 'has-Expression', 'hasPart', 'addedWith', 'hasValue', 'includes', 'partOf', 'alsoCalled', 'dueTo', 'in-stanceOf', 'isAbbrev', 'isAcronym', 'hasComparator'.

Fusion of structured and unstructured knowledge
All triples found with the above-mentioned methods were aggregated together to create a knowledge graph of about 65000 triples.Fig1(b) depicts the merger of Wikidata and textbook knowledge.We created a collection of possible synonyms for various entities to enable us to merge Wikidata entities with textbook entities.We found that out of 1200 textbook entities, about 25% were present in Wikidata.We also found some links between entities which otherwise were not present in Wikidata due to limited relations.
3 Knowledge driven QA

Domain specific question answering
The Knowledge Graph for manufacturing (FabKG) is suitable for answering questions and powering a chatbot to answer questions.The FabKG is a directed graph G = (V, E) where the node v ∈ V denotes named entities of manufacturing, numeric literal or expression, and the edge e ∈ E denotes directed relation between the nodes.Given a natural language question as input, the entities are categorized in their respective classes.Based on the subject and predicate most similar object (highest cosine similarity) to the category in the knowledge base is queried.Some of the common domain specific questions could not be answered using general purpose search engines.Examples of questions that could be answered by FabKG are: a. Which tool geometry is used for planning?b.Which material has more hardness, cermet or alumina?Note: We have used a hasComparator relation specifying various comparison values in our KG that could answer the 'more' and 'less' inference question.c.What is the composition of Tungsten in cast cobalt?d.Which nontraditional manufacturing process is used for coining operations?e.What is the length to depth ratio for discontinuous fibers? Figure 3: A small subgraph showing the links of entities connected with other expressions for ease of calculation, making the system think the we way just like humans do.

Expression based question answering
We have included some manufacturing-specific formulas/expressions in the knowledge graph to enable inference-based calculations.Since we have captured some formulae linked with entities using 'hasExpression' relation, traversing for the formula node in the graph is easy.We have also included a simple rule for calculation-type questions.Here is an example question below: Calculate the strain on the cylinder given the area 1 cm 2 , 10N force, and Young's modulus for steel 200 GPa.
Given the question above, we have some 'formula entities': area, force, and young modulus of steel.These entities are queried in the KG for any available linked expression.Similar to MathGraph (Zhao et al., 2019), we utilize SymPy (Meurer et al., 2017) to convert the queried expression into a mathematical equation with variables, and to perform the calculation, we use some basic rules of precedence to fetch the results.As shown in fig.3, we can find strain using Young's modulus and stress; however, since stress is not known, we calculate stress as the first step using force and area.This process depicts the way human thinks while answering a question with some inputs and related expressions.
Some other examples of questions forms that are easier than the above-mentioned questions: a. Calculate material removal rate given feed rate, cutting speed, and depth of cut.HINT: We can calculate the Material removal rate using (feed rate)*(cutting speed)*(depth of cut).b.Calculate measuring length of roughness given cutoff length of 0.8.HINT: measuring length of roughness = 0.5 * cutoff length.
We have developed FabKG -a knowledge graph for product design and manufacturing, which utilizes two critical sources of knowledge, (1) Wikidata and (2) Human constructed notes, that combine structured/unstructured knowledge towards answering question-related to product development and manufacturing.Using this KG, students, product developers, and knowledge seekers can get good insights into various concepts and fundamentals about various topics in this domain.Using all the methods described above, we have found 65000+ triples in 12 entity categories.In the future, we plan to use the heterogeneous knowledge graph for directed relation prediction in the bigger corpus, performing graph embedding and link prediction.Moreover, lecture presentations with succinct text could also be utilized for finding entities and relations.Generally, the title/topic of the presentation symbolizes the subject, with some entities either written directly or placed after another subtopic.Furthermore, 'property/attribute' of relation through the specific value of entities such as the strength of materials, carbon content, Brinell hardness, Etc., currently available in tabular form in books and other resources, can be added to the KG.The same could be represented using a hypergraph by combining multimodal data.Therefore, the new graph structure would have not only an 'entity-relation-entity' type graph but also an 'entity-attribute-value' graph.Finally, this knowledge graph could help link to global knowledge by contributing to existing Wikidata knowledge with the help of Wikimapper.

Figure 1 :
Figure 1: (a) Manufacturing Knowledge Graph construction methodology (b) Use of SPARQLWrapper to fetch wikidata items associated with 'crystal structure' in two step forward and two step backward.This image also shows addition of entities from student notes.

Figure 2 :
Figure 2: Conversion of concise notes to structured graph

Table 1 :
Named entity recognition performance for Manufacturing dataset