Natural Language Processing Meets Quantum Physics: A Survey and Categorization

Recent research has investigated quantum NLP, designing algorithms that process natural language in quantum computers, and also quantum-inspired algorithms that improve NLP performance on classical computers. In this survey, we review representative methods at the intersection of NLP and quantum physics in the past ten years, categorizing them according to the use of quantum theory, the linguistic targets that are modeled, and the downstream application. The literature review ends with a discussion on the key factors to the success that has been achieved by existing work, as well as challenges ahead, with the goal of better understanding the promises and further directions.


Introduction
Quantum computing has received much interest in recent years. The basic idea is to make use the power of quantum mechanics for solving computational problems (Shor, 1999;Nielsen and Chuang, 2002). While particular quantum algorithms can be substantially faster alternatives to classical counterparts (Biamonte et al., 2017;Arute et al., 2019), the mathematical framework of quantum physics has also been exploited for cognition (Busemeyer and Bruza, 2012), optimization (Soleimanpour et al., 2014) and other disciplines. In the field of natural language processing (NLP), quantum mechanics has seen a surge of recent research interests, addressing problems ranging from lexical semantic ambiguities (Meyer and Lewis, 2020) to semantic composition (Coecke et al., 2020), and from information retrieval (Jiang et al., 2020) to text classification , where different characteristics of quantum physics have inspired novel algorithms.
Despite its growing research literature, no survey has reviewed and categorized the quantum NLP field. The most relevant surveys are on quantuminspired information retrieval (Uprety et al., 2020;Melucci, 2015). However, they did not include many important findings in the quantum NLP field. Abohashima et al. (2020) and Garg and Ramakrishnan (2020) generally reviewed the field of quantum machine learning. They also briefly mentioned several quantum algorithms for NLP, but they did not discuss them comprehensively or in detail. The goal of our paper is to, for the first time, propose a categorization of quantum NLP in the past ten years, aiming to provide the latest knowledge of developments and achievements in this field.
We categorize existing work on quantum NLP based on the following three dimensions: 1 The types of algorithms. Many quantuminspired NLP algorithms run on classical computers, and some quantum NLP algorithms can potentially be implemented on quantum hardware (Section 3).
2 The modeling target. Quantum physics is used for modeling different features of language (Section 4).
3 The applications. These algorithms have different applications, e.g. information retrieval, question answering (Section 5).
Although quantum NLP is still an emerging field, existing work shows exciting promise-not only better performance but also more efficient calculations are possible. In addition, noisy intermediatescale quantum (NISQ) computers already exist and seem to have potential use in NLP tasks (Coecke et al., 2020;Lorenz et al., 2021). It has been shown that quantum NLP can take effect in addressing the inherent ambiguities of words, representing lexical semantic correlations, and calculating semantic composition, which is useful for a set of language modeling and information retrieval tasks. On the other hand, success has been achieved only on small scales, and the key reason for achieving competitive performance still needs further understanding. The theoretical evidence provided in the literature cannot yet lead to a conclusion that quantum physics can gain substantial computational advantages in wider NLP tasks.

Quantum Physics Preliminaries
The simplest quantum mechanical system is a qubit, which has two possible states: |0 and |1 , where '|· ' is called the Dirac notation, and a ket |ψ denotes a unit column vector. Similarly, the row vector ψ † is expressed as a bra ψ|, where the dagger ( †) corresponds to the conjugate transpose. A qubit can be represented by the linear combination of states, often called superposition: (1) where a and b are complex numbers and |a| 2 + |b| 2 = 1. Thus the state of a qubit is a unit vector in a two-dimensional complex vector space. When we measure a qubit we obtain either 0, with probability |a| 2 , or 1, with probability |b| 2 . The superposition state can be used for representing multiple meanings of a word. For example, think of a mouse again as a small rodent and a hand-held pointing device. This two independent latent concepts can be denoted as |rodent and |device . Then, the word 'mouse' can be modeled as a superposition state, i.e. |mouse = a |rodent + b |device .
Entanglement is another elementary and unique resource of quantum mechanics which plays a key role in many interesting applications of quantum computing. Consider the following two-qubit entangled Bell state (Nielsen and Chuang, 2002): As discussed earlier, when we measure the first qubit, we obtain two possible results: 0 with probability 1/2 and 1 with probability 1/2. According to Eq.2, a measurement of the second qubit always gives the same outcome as the measurement of the first qubit, because the measurement results of these two entangled qubits are correlated. Coecke et al. (2020) proposed that if words are encoded into quantum states, then the grammatical structure is to entangle these states. Because grammar is what correlates meanings between words. We will explain this in Section 4.2.2.
Projective measurements are the most general form of measurement in quantum mechanics, where the measurement operators are projectors P that satisfy P 2 = P . If the state is |ψ before projective measurement then the probability that result m occurs is given by p(m) = ψ| P m |ψ . The state after measurement is: Projective measurement can be applied to calculate cosine similarity in NLP, which measures the similarities between two vectors. Suppose |A and |B represent word A and B, respectively. Then, the cosine similarity of these two word vectors is where P B = |B B| is a projective measurement operator.
In addition to state vectors, quantum mechanics can also be formulated using density matrix, which is mathematically equivalent. Suppose that a quantum system is in one of the states |ψ i , where i is an index, with probability p i . The definition of the density matrix is: More information about quantum computing can be found in (Nielsen and Chuang, 2002).

The Types of Algorithms
Algorithms at the intersection of NLP and quantum physics can be implemented either on quantum computers or classical computers. The former ones are usually called quantum algorithms and the latter ones are usually named quantum-inspired or quantum-like models which are classical algorithms. We refer to both design of classical NLP algorithms inspired by quantum physics and quantum algorithms to process NLP tasks as quantum NLP.
We organize the main surveyed work in Table 1. This section and the next two sections discuss the categorization with regard to the algorithm type, the modeling target, and the application, respectively.

Quantum Algorithms
In quantum computing, a quantum algorithm is an algorithm that runs on real quantum computers. With regard to representation, Coecke et al.    (Coecke and Kissinger, 2018;Coecke et al., 2020) to demonstrate: (a) A ket |ψ , (b) A bra ψ|, (c) Bell state in Eq. 2, (d) (g(|ϕ 1 ⊗ |ϕ 2 )) ⊗ I, where matrix multiplication looks like connecting up the inputs and outputs of boxes and tensor product looks like placing boxes side by side.
(2010) constituted a graphical framework (DisCo-Cat) for natural language that combines words and builds the meaning of a sentence instead of thinking of a sentence as a bag of words. They devised a graphical framework from previous work which represents quantum mechanics pictorially by using lines, triangles, and so on (Coecke and Kissinger, 2018). As an example, in Figure 1, we use this graphical framework to demonstrate the ket, bra, and two-qubit entangled states introduced in Section 2.
Zeng and Coecke (2016) first discussed whether a quantum computer can be applied to process natural language, showing a quantum algorithm for calculating sentence similarity that, under certain conditions, achieves a quadratic speedup over classical methods (see Table 2). This quadratic speedup, however, requires quantum random access memory (QRAM), which is expensive and remains unrealized (Biamonte et al., 2017). Considering this problem, Meichanetzidis et al. (2020a) and Coecke et al. (2020) proposed quantum algorithms that can potentially be implemented in existing NISQ computers. Wiebe et al. (2019) presented a representation for the linguistic structure which can encode NLP problems into small quantum devices. As a proof-of-concept experiment, Meichanetzidis et al. (2020b) performed the first quantum NLP task using a small dataset on NISQ hardware. To present larger-scale experiments, Lorenz et al. (2021) implemented models that solve sentence classification tasks on NISQ computers for datasets of size ≥ 100 sentences. These works pave the way for practical quantum NLP in the NISQ era.

Classical Algorithms
Quantum-inspired or quantum-like NLP algorithms have been designed for classical computers, and some of them achieve comparable performance to state-of-the-art models (Jiang et al., 2020;. For the sake of applicability, these classical algorithms borrow mathematical frameworks from quantum mechanics but are not constrained by the quantum computing operations when processing the data. Van Rijsbergen (2004) first proposed to unify information retrieval models into the mathematical framework of quantum mechanics in Hilbert space. Sordoni et al. (2013) proposed a quantum language model, which models term dependencies using the density matrix. This work indicates that the density matrix may be a more general representation of texts. Based on this, Basile and Tamburini (2017) presented a language model using the evolution of the state which can be implemented in speech recognition.  encoded words as quantum states and sentences as mixed systems.
Recently, in order to improve practicality, some quantum-inspired neural networks for natural language problems have been proposed.  use a density matrix based convolutional network to capture interactions within each utterance, outperforms a number of state-of-the-art sentiment analysis algorithms. Jiang et al. (2020) proposed a quantum interference inspired neural matching model with application to ad-hoc retrieval. The main difference between these quantum-inspired neural models for NLP and the existing neural based models is that the former models use the mathematical framework of quantum theory to describe language features. These features described by quantum theory are then used as the input of the neural network. Using quantum mechanics concepts to describe features have better interpretability, because they have more transparent physical explanations. It is also more beneficial to the subsequent neural network to extract useful information.
The above quantum-inspired neural networks are mainly for improving end-to-end performance, but still lack a theoretical foundation for the connection between quantum-inspired language model and neural network. Tensor networks, which factorize very large tensors into networks of smaller tensors, can help the theoretical understanding of existing neural networks (Levine et al., 2018). Based on tensor decomposition, Zhang et al. (2018b) proposed a quantum many-body wave function (QMWF) inspired language modeling and showed a mathematical understanding of using convolutional neural network (CNN). More recently,  proposed a tensor network method (namely TextTN) for natural language representation. Tensor network can not only run on a classical computer but also can be transformed into a quantum circuit. In addition, the hyper-parameters of TextTN can be well interpreted by the entanglement entropy .

The Modeling Target
Both quantum-inspired algorithms and quantum algorithms can model different features in the language. We divide them into word representation (Section 4.1) and composition (Section 4.2).

Word Representation
How to represent words is essential for most NLP tasks and can affect performance. Using quantum physics for word representation has the potential to help including more features for words.

Modeling Word Ambiguity
Word ambiguity is a combination of distinct known meanings. , Li et al. (2018) and Coecke et al. (2020) adopted superposition state and complex number to formulate this combination. The latent concepts of a word form a set of pure orthonormal states of the space {|C i }. This word t is modeled as a superposition state in which the amplitude {a i } n i=1 are complex numbers and n i=1 |a i | 2 = 1. As mentioned in Section 2, if the superposition state is measured, it will collapse into the basis vector. This means that when a word is observed within a certain context it will collapse to one of its known meanings.  and  showed that we can benefit from the complex-valued word embedding and the phases can be linked to some important features such as word positions. Moreover, the computational space increase exponentially with the size of the system (Coecke et al., 2020). If we consider a system of n qubits, then a quantum state of this system can represent a word that has 2 n latent concepts and is specified by 2 n amplitude. Trying to store all these complex numbers and vectors can be challenging on classical computers.
Meyer and Lewis (2020), Bankova et al. (2018) and Piedeleu et al. (2015) adopted density matrices to model lexical ambiguity. Unlike commonly-used methods which map words into vectors, they map words into matrices.

Modeling Hyponymy Relations
Hyponym refers to the fact that a word's semantic field is included within another word's. This relations can be encoded in projectors (Lewis, 2019;Bankova et al., 2018). For example, apple is an example of fruit, which is an example of food. This hyponymy relation can be encoded in projectors: here the normalized factors are ignored.

Composition
Given the meaning of each word, sentences can be understood by the composition of such lexical semantic units. Algorithms based on quantum physics can help to model this process.

Modeling Term Dependencies
Quantum-inspired algorithms have been considered for dependencies between terms in frequently occurring multiword expressions. The quantum language model (QLM) proposed by Sordoni et al. (2013) first applies quantum theory to model term dependencies, argue that there may be a situation in which classical probability fails and need to switch to a more general probabilistic theory. They map words w to projectors: where w ∈ V and |e w is the one-hot encoding of the word w. For example, consider V = Figure 2: Diagrammatic form of the reduction n(n r sn l )n → (nn r )s(n l n) → 1s1 → s, where n is noun, s is declarative statement, and cup denote the grammar reductions. According to pregroup grammar (Lorenz et al., 2021;Lambek, 2008), Jack likes Rose is grammatical because of above reduction.
{natural, language}. Then Π language is: The relationship linking two or more words is represented by a subset of the vocabulary κ = {w 1 , w 2 , ..., w n } and encoded into a new projector: where {a i } n i=1 are real numbers and n i=1 a 2 i = 1. For example, we can model the dependency between natural and language, κ nl = {natural, language}, by K nl = |κ nl κ nl |, where |κ nl = 2 5 |e natural + 3 5 |e language . Then |κ nl is a superposition state and K nl is a density matrix. In quantum mechanics, elements of the density matrix K nl contain the correlation between quantum states |e natural and |e language , thus dependency between natural and language is modeled. This method of modeling term dependency is interpretable and has physical meaning.
Some algorithms have been proposed based on above QLM. Xie et al. (2015) took entanglement into consideration which is not considered in original QLM,  adopted word embedding instead of one-hot encoding, and so on. The basic and important idea behind these algorithms is to treat word vectors as quantum states from which we can obtain the density matrix of the sentence or document. Then this density matrix naturally contains the correlation of these quantum states, which means the dependence between words is modeled.

Task
Dataset Models Metrics

Modeling Grammar
Pregroup gramma (Lambek, 1997) is used for analyzing the structure of natural languages. As an algebraic gadget, pregroup grammar can be denoted using having cup-shaped wires (Lambek, 2008). We show an example sentence in Figure 2. From Figure 1 and Figure 2, we can see that diagrammatic frameworks used for quantum mechanics and pregroup grammar are partially similar. Coecke et al. (2010) introduced a model based on tensor product composition, which uses pregroup grammar to compute the meaning of sentences and phrases. Coecke et al. (2020) recast this model in quantum computational terms and showed that pregroup can always be made using only Bell-effect and identities.
Here is an example of how to use Bell-effect and identities to represent applying an adjective to a noun. Assuming the meaning of story is a 1-qubit state |ψ story ∈ C 2 and the meaning of adjective happy is a 2-qubit state |ψ love ∈ C 2 ⊗ C 2 . In happy story, happy modifies the noun story. Coecke et al. (2020) model this modification using where Bell| = 00| + 11| and I is the identity. The mapping (I ⊗ Bell|) shows the interaction between the meaning of words. Using diagrammatic notation (Coecke and Kissinger, 2018;Coecke et al., 2020), our example is illustrated in Figure 3. The pentagon represents the quantum state, the straight line represents the identity matrix, and the cup-shaped wire represents the Bell-effect. Coecke et al. (2020) also showed that this type of wire structure and pregroup grammar can be equivalent, and thus to some extent NLP is quantum native.

Applications
Quantum NLP shows comparable or better performance compared with strong baselines for some tasks. We summarize the results of these algorithms in Table 3.
Information retrieval (IR). Sordoni et al. (2013) first proposed a quantum language model for IR, representing terms in queries and documents as superposition events attached with quantum probability, which has no classical analog. Extensions of the quantum language model have also been proposed for IR. Xie et al. (2015) advanced the QLM framework by taking into account quantum entanglement, which has a significant cognitive implication. Li et al. (2018) proposed an algorithm to help improve convergence. Jiang et al. (2020) took interference into account, which produces additional contributions to the total probability beyond classical cases. Based on this new contributions, they proposed a matching model for ad-hoc retrieval. The quantum matching models outperform some traditional models.
Question answering (QA). Zhang et al. (2018a) used density matrices to represent questions and answers and introduced a joint representation to model the similarities between the question and answers. This joint representation is then used as an input to a neural network.  proposed a complex-valued network for QA, which is interpretable and shows comparable performance to strong CNN and RNN baselines. Coecke et al. (2020) mentioned that QA tasks can be executed on quantum computers. After mapping a question to a vector, QA tasks become the task to find the closest vector in the answer vectors pool. They exploited quantum advantage for finding the closest vector (Wiebe et al., 2015) and showed quantum speedup. Meichanetzidis et al. (2020b) showed the first-ever quantum NLP experiment on quantum hardware through a QA task. Although this is a proof-ofconcept experiment, it paved the way for the future use of quantum computers to deal with practical NLP problems.

Speech recognition.
Basile and Tamburini (2017) introduced a quantum language model with the application for speech recognition, where words are encoded into measurement operators and the sequence of words is modeled as the evolution of quantum systems.
Text classification. Zhang et al. (2018c) explored the possibility of using quantum physics on sentiment classification tasks. Two sentiment dic-tionaries were constructed. They generated density matrices for dictionaries and documents and used quantum relative entropy as characterization of the similarity between dictionaries and documents to determine its sentiment.  introduced quantum-inspired interactive networks, where a density matrix that capture correlations between words was used as an input of long shortterm memory neural network. In order to effectively combine multiple information from different sources,  further extended their work with two modalities, namely text and visual modalities. Considering the interpretability and expressive power of tensor network,  proposed a tensor network based architecture for natural language.

Benefits
In this section, we make a summary of potential benefits of quantum NLP, and discuss the most salient directions that remain under-explored due to various reasons.
Lowering computational cost. Some articles have demonstrated quantum speedup for specific NLP tasks, such as question answering (Coecke et al., 2020;Zeng and Coecke, 2016). Quantum search algorithm (Grover, 1996), quantum nearestneighbor algorithm (Wiebe et al., 2015) and other quantum algorithms which achieve speedup over classical algorithms could be used after classical language features are encoded into quantum states. As mentioned in Section 4, the quantum superposition is suitable for modeling uncertainties in language, such as word ambiguity . And entanglement can describe the composition of lexical semantic units (Coecke et al., 2020;Meichanetzidis et al., 2020b). It's possible that, by adaptations to quantum algorithms and deployments to quantum computers, a family of NLP tasks can enjoy quantum speedup.
Enhancing learning ability. Quantum mechanics is well-known to generate counter-intuitive patterns (Biamonte et al., 2017). It is reasonable to hope that quantum computers can recognize some patterns that cannot be recognized by classical computers. As shown in Table 3, some quantum NLP models have shown comparable or better performance over strong baselines. And the framework of quantum mechanics can be applied to model some features that are difficult to model with classical probabil-ity. For example, quantum theory is used to model interference phenomenon in information retrieval (Jiang et al., 2020) and term dependencies (Sordoni et al., 2013). It's more consistent with human cognition. Li et al. (2021) demonstrate that neural machine translation models fail badly on compositional generalization. According to existing paper, we believe quantum NLP models have potential advantages in compositional generalization problem.
Increasing storage capacity. Quantum computers have strong storage capabilities. As mentioned before, Coecke hold the view that NLP is quantumnative (Meichanetzidis et al., 2020b;Coecke et al., 2020) such that the exponentially large vector space required to represent sentences can only be naturally and feasibly realized in quantum computers. From this point of view, developments of quantum language models will be beneficial also in terms of storage efficiency.

Future Directions
Despite the emerging promises, quantum NLP has yet to see its full-fledged advantages to the dominant neural methods. Significant advances in one or more of the following directions can give strong boosts to the research field.
Quantum machine learning. Existing work showed that there is a fundamental connection between machine learning and quantum physics (Levine et al., 2018;Hughes et al., 2019). For example, tensor network is a method that bridges machine learning and quantum theory, which can also enhance the theoretical understanding of existing neural networks (Levine et al., 2019). For NLP, designing an effective tensor network approach can lead to better interpretability . On the other hand, most quantum NLP models still use real vectors (Jiang et al., 2020;, partly because there are no obvious features corresponding to the imaginary part. However, quantum phenomena cannot be fully expressed without complex numbers. In quantum neural networks, complex numbers and quantum phenomena can be naturally modeled. It has been shown that both complex-valued representation of natural language , and complex-valued neural networks (Trabelsi et al., 2018) can lead to benefits.
Wider applications. We have shown that quantum NLP algorithms can be used for information retrieval (Jiang et al., 2020), question answering (Meichanetzidis et al., 2020b), and so on. These are relatively simple tasks and quantum NLP models have not been extended to more challenging tasks such as text generation and automatic summarization. Finding wider NLP tasks that can benefit from quantum physics is also a remaining direction.
Quantum advantages. In quantum computing, quantum supremacy or quantum advantage is the goal of demonstrating that a quantum computer can solve a problem that no classical computer can solve in any reasonable amount of time. Whether there are concrete examples in NLP that can show quantum advantages is a fundamental and important question. According to existing paper, there may be quantum advantages in NLP tasks which need similarity calculations, such as the similarity of the query and the documents, the similarity of the sentences, and the similarity of the question and the answers.

Conclusion
Thus far, articles have demonstrated early success in representing and processing text using quantum computers. Their design is scalable and when hardware becomes more powerful they can scale up the size of the meaning spaces and complexity of the tasks. The key to whether or not quantum computers will be used to deal with NLP in the future lies in whether quantum algorithms can show quantum advantage. Meanwhile, quantum-inspired models have shown strong performance on classical computers for certain tasks, and have better interpretability. The main difficulty in this direction is that neural networks have already achieved high accuracy on many NLP tasks. Nevertheless, it is still worthwhile to explore the mathematical framework of quantum mechanics where a strong expressive ability and a corresponding physical explanation are expected. Finally, it can also be possible that if neither of the above two directions has major breakthroughs, then this quantum NLP field may temporarily lose research attention during a period of time.