DomiKnowS: A Library for Integration of Symbolic Domain Knowledge in Deep Learning

We demonstrate a library for the integration of domain knowledge in deep learning architectures. Using this library, the structure of the data is expressed symbolically via graph declarations and the logical constraints over outputs or latent variables can be seamlessly added to the deep models. The domain knowledge can be defined explicitly, which improves the models' explainability in addition to the performance and generalizability in the low-data regime. Several approaches for such an integration of symbolic and sub-symbolic models have been introduced; however, there is no library to facilitate the programming for such an integration in a generic way while various underlying algorithms can be used. Our library aims to simplify programming for such an integration in both training and inference phases while separating the knowledge representation from learning algorithms. We showcase various NLP benchmark tasks and beyond. The framework is publicly available at Github(https://github.com/HLR/DomiKnowS).


Introduction
Current deep learning architectures are known to be data-hungry with issues mainly in generalizability and explainability (Nguyen et al., 2015). While these issues are hot research topics, one approach to address them is to inject external knowledge directly into the models when possible. While learning from examples revolutionized the way that intelligent systems are designed to gain knowledge, many tasks lack adequate data resources. Generating examples to capture knowledge is an expensive and lengthy process and especially not efficient when such a knowledge is available explicitly. Therefore, one main motivation of our DomiKnowS is to facilitate the integration of domain knowledge in deep learning architectures, in particular when this knowledge is represented symbolically. 1 https://github.com/HLR/DomiKnowS In this demonstration paper, we highlight the components of this framework that help to combine learning from data and exploiting knowledge in learning, including: 1) Learning problem specification 2) Knowledge representation 3) Algorithms for integration of knowledge and learning. Currently, DomiKnowS implementation relies on py-Torch and off-the-shelf optimization solvers such as Gurobi. However, it can be extended by developing hooks to other solvers and deep learning libraries since the interface is generic and independent from the underlying computational modules.
In general, the integration of domain knowledge can be done 1) using pretrained models and transferring knowledge (Devlin et al., 2018;Mirzaee et al., 2021), 2) designing architectures that integrate knowledge expressed in knowledge bases (KB) and knowledge graphs (KG) in a way that the KB/KG context influences the learned representations (Yang and Mitchell, 2019;Sun et al., 2018), or 3) using the knowledge explicitly and logically as a set of constrains or preferences over the inputs or outputs (Li and Srikumar, 2019a;Nandwani et al., 2019b;Muralidhar et al., 2018;Stewart and Ermon, 2017). Our current library aims at facilitating the third approach. While applying the constraints on input is technically trivial and could be done in a data pre-processing step, applying constraints over outputs and considering those structural constraints during training is a research challenge (Nandwani et al., 2019b;Li and Srikumar, 2019a;Guo et al., 2020). This requires encoding the knowledge at the algorithmic level. However, given that the constraints can be expressed logically and symbolically, having a language to express such a knowledge in a principled way is lacking in the current machine learning libraries. Using our developed DomiKnowS library, the domain knowledge will be provided symbolically and by the user utilizing a logical language that we have defined. This knowledge is used in various ways: arXiv:2108.12370v1 [cs.LG] 27 Aug 2021 a) As soft constraints by considering the violations as a part of loss function, this is done using a primdual formulation (Nandwani et al., 2019b) and can be expanded to probabilistic and sampling-based approaches (Xu et al., 2018) or by mapping the constraint to differentiable operations (Li and Srikumar, 2019b) b) mapping the constrains to an integer linear program and perform inference-based training by masking the loss (Guo et al., 2020). Independent form the training paradigm the constraint can be always used as hard constraints during inference or not used at all.
An interactive online demo of the framework is available at Google Colab 2 and the framework is accessible on GitHub 3 .

Related Research
Integration of domain knowledge in learning relates to tools that try to express the prior or posterior information about variables beyond what is in the data. This relates to probabilistic programming languages such as (Pfeffer, 2016), Venture (Mansinghka et al., 2014), Stan (Carpenter et al., 2017), andInferNet (Minka et al., 2012). The logical expression of domain knowledge is used in probabilistic logical programming languages such as ProbLog (De Raedt et al., 2007), PRISM (Sato and Kameya, 1997), the recent version of Problog, that is, Deep Problog (Manhaeve et al., 2018), Statistical Relational Learning tools, such as Markov logic networks (Domingos and Richardson, 2004), Probabilistic soft logic (Broecheler et al., 2010), Bayesian Logic (BLOG) (Milch et al., 2005), and slightly related to learning over graph structures (Zheng et al., 2020). Considering the structure of the output without its explicit declaration is considered in structured output prediction tools (Rush, 2020). This library is mostly related to the previous efforts for learning based programming and the integration of logical constraints in learning with classical machine learning approaches (Rizzolo and Roth, 2010;Kordjamshidi et al., 2015Kordjamshidi et al., , 2016. Our framework makes this connection to deep neural network libraries and arbitrarily designed architectures. The unique feature of our library is that, the graph structure is defined symbolically based on the concepts in the domain. Despite Torch-struct (Rush, 2020), our library is independent from the underlying algorithms, and ar-bitrary structures can be expressed and used based on various underlying algorithms. In contrast to DeepProbLog, we are not limited to probabilistic inference and any solver can be used for inference depending on the training paradigm that is used for exploiting the logical constraints. Probabilistic soft logic is another framework that considers logical constraints in learning by mapping the constraint declarations to a Hing loss Markov random field (Bach et al., 2017). DRaiL is another declarative framework that is using logical constraints on top of deep learning and converts them to an integer linear program at at inference time (Zhang et al., 2016). None of the above mentioned frameworks accommodate working with raw sensory data nor help in putting that in an operational structure that can form the domain predicates and be used by learning modules while our framework tries to address that challenge. We support training paradigms that make use of the inference as a black box and in those cases any constraint optimization, logical inference engine or probabilistic inference tool can be integrated and used based on our abstraction and the provided modularity.

Declarative Learning-based Programming
We use the Entity-Mention-Relation (EMR) extraction task to describe the framework. We discuss more showcases in Section 5.
Given an input text such as "Washington is employed by Associated Press.", the task is to extract the entities and classify their types (e.g., people, organizations, and locations) as well as relations between them (e.g., works for, lives in). For example, for the above sentence [Washington] is a person [Associated Press] is an organization and the relationship between these two entities is work-f or. We choose this task as it includes the prediction of multiple outputs at the sentence level, while there are global constraints over the outputs. 4 In DomiKnowS, first, using our python-based specification language the user describes the problem and its logical constraints declarativly independent from the solutions. Second, it defines the necessary computational units (here PyTorch-based architectures) and connect the solution to the problem specification. Third, a program instance is created to execute the model using a background knowledge integration method with respect to the problem description.

Problem Specification
To model a problem in DomiKnowS, the user should specify the problem domain as a conceptual graph G(V, E). The nodes in V represent concepts and the edges in E are relationships. Each node can take a set of properties P = P 1 , P 2 , ..., P n . Later, the logical constraints are expressed using the concepts in the graph. In EMR task, the graph contains some initial NLP concepts such as sentence, phrase, pair and additional domain concepts such as people, organization, and work-for.

Concepts
Each problem definition can contain three main types of concepts (nodes). Basic Concepts define the structure of the input of the learning problem. For instance sentence, phrase, and word are all base concepts that can be defined in the EMR task. Compositional Concepts are used to define the many-to-many relationships between the basic concepts. Here, the pair concept in the EMR task is a compositional concept. This is used as a basic concept for the relation extraction. We will further discuss this when describing edges in Section 3.1.2. Decision Concepts are derived concepts which are usually the outputs of the problem and subject to prediction. They are derived from the basic or compositional concepts. The people, organization, and work-for are examples of derived concepts in the EMR conceptual graph. Following is a partial snippet showing the definition of basic and compositional concepts for EMR task.
The following snippet also shows the definition of some derived concepts in EMR example. The entity, people, organization and location are the derived concepts from the phrase concept and the rest are derived from the pair.

Edges
After defining the concepts, the user should specify existing relationships between them as edges in the conceptual graph. Edges are used to either map instances from one concept to another, or generate instances of a concept from another concept. DomiKnowS only supports a set of predefined edge types, namely is_a, has_a, contains,. is_a is automatically defined between a derived concept and its parent. In the EMR example, there is an is_a edge between people and entity. is_a is mostly used to introduce hierarchical constraints and relate the basic and derived concepts.
Has_a connects a compositional concept to its components (also referred to as arguments). In the EMR example, pair concept has two has_a edges to the phrase concept to specify the arg1 and arg2 of the composition. We allow an arbitrary number of arguments in a has_a relationship, see below. Contains edge defines a one-to-many relationship between two concepts to represent (parent, child) relationship. Here, the parents of a concept is not necessarily limited to be only one. Following is a sample snippet to define a contains edge between sentence and phrase: 1 sentence.contains(phrase)

Global Constraints
The constraint definition is the part where the prior knowledge of the problem is defined to enable domain integration. The constraints of each task should be defined on top of the problem using the specified concepts and relationships there.
The constraints can be 1) automatically inferred from the conceptual graph structure, 2) extracted from the standard ontology formalism (here OWL 5 ), 3) explicitly defined in the DomiKnowS's logical constraint language. The framework internally uses defined constraints in the training-time or inference-time optimization depending on the integration method selected for the task. We discuss the inference phase in more details in section 4.1.

Ontology Web Language
Here is an example of constraint in our constraint language for the EMR task: The above constraint indicates that a work_for relationship only holds between people and organization. Other syntactic variations of this constraint are shown in the Appendix.
To process constraints, DomiKnowS maps those to a set of equivalent algebraic inequalities or their soft logic interpretation depending on the integration method. We discuss this more in Section 4.1.

Model Declaration
Model declaration phase is about defining the computational units of the task. The basic building blocks of the model in DomiKnowS are sensors and learners, which are used to define either deterministic or probabilistic functionalities of the model. Sensor/Learners interact with the conceptual graph by defining properties on the concepts (nodes). Each sensor/learner receives a set of inputs either from the raw data or property values on the graph and introduces new property values. Sensors are computational units with no trainable parameters; and learners are the ones including the neural models. As stated before, the model declaration phase only defines the connection of the graph properties to the computational units and the execution is done later by the program instances.
The user can use any deep learning architecture compatible with pyTorch modules alongside the set of pre-designed and commonly-used neural architectures currently existing in the framework. To facilitate modeling different architectures and computational algorithms in DomiKnowS, we provide a set of predefined sensors to do basic mathematical operations and linguistic feature extraction. Following is a short snippet of defining some sensors/learners for the EMR task. In this example, the sensor Word2Vec is used to obtain token representations from the "text" property of each phrase. There is also a very simple and straightforward linear neural model to classify phrases and pairs into different classes such as people, organization, etc.

Learning and Evaluation in DomiKnowS
To execute the defined model with respect to the problem graph, DomiKnowS uses program instances. A program instance is responsible to run the model, apply loss functions, optimize the parameters, connect the output decisions to the inference algorithms, and generate the final results and metrics. Executing the program instance relies on the problem graph, model declaration, dataloaders and a backbone data structure called DataNode.
DataLoader provides an iterable object to loop over the data. DataNode is an instance of the conceptual graph to keep track of the data instances and store the computational results of the sensors and learners. For the EMR task the program definition is as follows: Here, the concepts passed to the poi field specifies the training points of the program. This enables the user to only train parts of the defined model. For each program instance, the user should specify the domain knowledge integration method. The available methods for integration is discussed in Section 4.1. After initializing the program, the user can call train, test, and prediction functionalities to train and evaluate the designed model. Below snippet is to run training and evaluation on the EMR task: Here, the user will specify the dataloaders for different sets of the data and the hyper-parameters required to train the model. Programs can be composed to address different training paradigms such as end-to-end or pipeline training by defining different training points for each program. More details available in the Appendix. Following is an alternative program definition for the pre-training phrases and training pairs:

Inference and Optimization
DomiKnowS provides access to a set of approaches to integrate background knowledge in the form of constraints on the output decisions or latent variables/concepts. Currently, DomiKnowS addresses three different paradigms for integration: 1) Learning + prediction time inference (L+I) 2) Trainingtime integration with hard constraints 3) Trainingtime integration with soft constraints. The first method, which we refer to as enforcing global constraints can also be combined and applied on top of the second and third approaches at inference-time. (1) To handle constraints in ILP, we create variables for each local decision of instances and transform the logical constraints to algebraic inequalities (Rizzolo and Roth, 2010) in terms of those variables. Auxiliary variables are added to represent the nested constraints. The inference method can be extended to support other approaches such as probabilistic inference and dynamic programming in future.
Integration of hard constraint in training: Here, we use our proposed inference-masked loss approach (IML) (Guo et al., 2020) which constructs a mask over local predictions based on the global inference results. The main intuition is to avoid updating the model based on local violations when the global inference can recover true labels from the current predictions. Given structured prediction F (θ) from a neural network and its global 6 Integer Linear Programming inference F * (θ) subject to the constraints, IML is extended from negative log likelihood as follows where Y is the structured ground-truth labels and indicates element-wise product. We implemented L IML(λ) which balances between negative log likelihood and IML with a factor λ as in (Guo et al., 2020). IML works best for very low-resource tasks where label disambiguation cannot be learned from the data but easier to be done given available relational constraints between instances of training samples. The required constraint mapping for the IML is the same as the global constraint optimization tool (here ILP).

Integration of soft constraints in training:
We use the primal-dual formulation of constraints proposed in (Nandwani et al., 2019a) to integrate soft constraints in training the models. Primal-Dual considers the constraints in the neural network training by augmenting the loss function using Lagrangian multipliers Λ for the violations of the constraints by the set of predictions. The constraints are regularized by a hinge function [C (F (θ))] + . The problem is formulated as a min-max optimization where it maximizes the Lagrangian function with the multipliers to enforce the constraints and minimize it with the parameters in the neural network. Here, instead of solving the min-max primal, we solve the max-min dual of the original problem.
During training, we optimize by minimization and maximization alternatively. With Primal-Dual strategy, the model learns to obey the constraints without the help from additional inference. Primal-Dual is less time-consuming at prediction-time than the previous methods as it does not need an additional ILP optimization phase. This can also be used for semi-supervised setting using domain knowledge.
Handling constraints in Primal-Dual is done by mapping them to their respective soft logic interpretation (Nandwani et al., 2019b).
It is an open research topic to identify which of the integration methods performs best for different tasks. However, DomiKnowS makes it effortless to use one problem specification and run all the aforementioned methods.

EMR
Our implementation of the EMR task is based on the CoNLL (Sang and De Meulder, 2003) benchmark and follows the same setting as in (Guo et al., 2020). The model uses BERT (Devlin et al., 2019) for token representation and a linear boolean classifier for each derived concept. The constraints used in this experiment are the domain and range constraints of pairs and the mutual exclusiveness of different derived concepts, which were seen during the previous sections. IML and Primal-Dual methods perform the same as the baseline using both 100% and 25% of the data, while ILP inference achieves 1.3% improvement on the 100% of data and 0.6% improvement on 25% of the data. More details are available in the Appendix.

Question Answering
We use WIQA (Tandon et al., 2019) benchmark as a sample question answering task in DomiKnowS. The problem graph contains paragraph, question, symmetric, and transitive concepts. Each paragraph comes with a set of questions. As explained by Asai and Hajishirzi, enforcing constraints between different question answers is beneficial to the models' performance. By modeling those constraints in DomiKnowS, ILP improves the model's accuracy from 74.22% to 79.05%, IML reaches 75.49%, Primal-Dual achieves 76.59%, and the combination of Primal-Dual and ILP performs best with 80.35% accuracy. More details are available in the Appendix. Following is a sample constraint defined for the task.

Image Classification
We use CIFAR-10 benchmark (Krizhevsky et al.) to show image classification task in DomiKnowS. CIFAR-10 consists of 60,000 colourful images of 10 classes with 6,000 image for each class. To construct the graph, we defined the derived concepts, airplane, dog, truck, automobile, bird, cat, deer, frog, horse and ship, and the base concept image. We introduce the disjoint constraint between the labels of an image. For this problem, hierarchical constraints between class labels can also be used easily. Both the disjoint and hierarchical constraints do not affect the accuracy of the task by a large margin and ILP can only achieve near 0.5% improvement over the classification task.

Inference-Only Example
This example is to show that DomiKnowS can solve pure optimization problems as well. The task is similar to the classic graph-coloring problem. A set of cities are given each of which can have a fire station or not. We want to allocate the fire stations to cities in a way that the following constraint is met. Constraint: For each city x, it either has a fire station or there exists a city y which is a neighbor of city x and has a fire station.
To implement this we define the basic concept city, the neighbor relationship between two cities and the derived concept FirestationCity. We have also included more showcases to solve sentiment analysis (Go et al., 2009) and email spam detection in our GitHub repository 7 . We will add models for procedural reasoning (Faghihi and Kordjamshidi, 2021) and spatial role labeling (Mirzaee et al., 2021) in future.

Conclusions and Future Work
DomiKnowS makes it effortless to integrate domain knowledge into deep neural network models using a unified framework. It allows users to switch between different algorithms and benefit from a rich source of abstracted functionalities and computational modules developed for multiple tasks. It allows naming concepts, defining their relationships symbolically and combining symbolic and sub-symbolic reasoning over the named concepts. DomiKnowS helps in interpretability of neural architectures by providing named layers and access to the neural models' computations at each stage of the training and evaluation process. As a future direction, we are looking to enrich our library with predefined functionalities and neural models and further extend the framework's ability to support more methods on integration of domain knowledge with deep neural models as well as seamless model composition. DomiKnowS is publicly available at our website 8 and on GitHub. 9

A Global Constraints and Mapping
Following is an example of the mapping between OWL constraint, graph structure and the logic python constraint. The ontology definition in OWL: ifL(work_for('x'), andL(people(path= ('x',arg1)), organization(path='x',arg2))) → → All three above constraints represent the same knowledge that a work_for relationship only holds between people and organization.
In order to map this logical constrain to ILP, the solver collects sets of candidates for each used in the constrain concepts.
ILP inequalities are created for each of the combinations of candidates sets. The internal nested andL logical expression is translated to a set of three algebraic inequalities. The new variable varAND) is created to transfer the result of the internal expression into the external one.

B Program Composition
The Program instances allow the user to define different training tasks without extra effort to change the underlying models. One can define end-to-end, pipelines, and pre-training and fine-tuning steps just by defining different program instances and calling them one after another. For instance, we can seamlessly switch between the following variations of learning paradigms on the EMR task. End-To-End training:

C Experiments
C.1 EMR Figure 1 shows the prior structural domain knowledge expressed as a graph and used in this example. It contains the basic concepts such as 'sentence', 'phrase', 'word', and 'pair' and the existing relationships between them alongside the possible output concepts such as 'people' and 'work_for'. Figure  2 also represents a sampe DataNode graph populated for a single phrase alongside its properties and decisions. Tables 1 and 2 summarize the results of the model applied to 25% of the training set and the whole training set of the CONLL dataset respectively.

C.2 Question Answering
WIQA dataset contains 39,705 multiple choice questions regarding cause and effects in the context of a procedural paragraph. The answer is always either is less, is more, or no effect. To model this task in DomiKnowS, we define paragraph, question, symmetric, and transitive concepts. Each paragraph comes with a set of questions. As explained    The results for the WIQA dataset are shown in table 3. As it is shown, IML incorporates the constraints in itself and improves the baseline a little. Primal Dual does a better job compared to IML in improving the accuracy with the help of the constraints. However, it is with the help of ILP that Baseline and Primal Dual achieve the best results.