On Neurons Invariant to Sentence Structural Changes in Neural Machine Translation

We present a methodology that explores how sentence structure is reflected in neural representations of machine translation systems. We demonstrate our model-agnostic approach with the Transformer English-German translation model. We analyze neuron-level correlation of activations between paraphrases while discussing the methodology challenges and the need for confound analysis to isolate the effects of shallow cues. We find that similarity between activation patterns can be mostly accounted for by similarity in word choice and sentence length. Following that, we manipulate neuron activations to control the syntactic form of the output. We show this intervention to be somewhat successful, indicating that deep models capture sentence-structure distinctions, despite finding no such indication at the neuron level. To conduct our experiments, we develop a semi-automatic method to generate meaning-preserving minimal pair paraphrases (active-passive voice and adverbial clause-noun phrase) and compile a corpus of such pairs.


Introduction
Understanding the roles neurons play is important for the interpretability of neural machine translation (NMT) models. Finding neurons that are either invariant or sensitive to sentence structures explains how NMT models encode such structures and what similarities they learned to abstract away from. Furthermore, it enables control of the output by direct manipulation of neurons.
Little previous work analyzed the interaction between input properties and individual neurons (see §7). Inspired by Computer Vision works that analyze model behaviour under non-semantic changes to the input (Lenc and Vedaldi, 2015;Goodfellow et al., 2009), we study how meaning-preserving paraphrases are represented in NMT. We also examine subsequent effects on the generated translation.
We propose a methodology to analyze correlation patterns of neuron activations between a source sentence and its structural paraphrase. We prepare two test cases: active voice to passive voice and an adverbial clause to a noun phrase (see Table 1). The methodology is motivated by Bau et al. (2019), who detected neurons that are highly correlated across LSTM models that differ in initializations.
We apply correlation analysis where the input sentence is the independent variable, not the model. Bau et al. (2019) further manipulated individual neurons to control semantic features at the word level (e.g., gender, tense). We extend their method to study the representation of syntactic structures, which influences the global organization of the sentence, rather than individual words. To carry out the analysis, we compile a dataset (section §2), consisting of English paraphrase pairs and matching German references. The sentence pairs have similar semantic meaning but a minimal and controlled change, which allows us a controlled analysis of the activation patterns while attempting to minimize the effect of potential confounds.
We examine the correlation between activation of neurons (see §4) (1) across models given identical input and (2) between a single model's neurons and themselves, given the source or the paraphrase. We detect strong correlation patterns, some appear in both measurements. This leads us to dissect the correlation to potential confounds, and indeed we discover that similar positional encodings and high overlap in lexical identity are the main contributors to the correlation between paraphrases. This suggests that the strongest correlations are incurred by similar input encoding and not by high-level abstractions learned by the model. Moreover, local correlation patterns do not distinguish between sentence paraphrases and lower level similarities, because overlap of arXiv:2110.03067v2 [cs.CL]  The party died down before she arrived The party died down before her arrival Table 1: Examples produced by our paraphrasing engine token embeddings and positional encoding is not exclusive to meaning-preserving paraphrases. We then experiment with controlling the translation output. It is done by a simple addition to neuron values -the difference in mean activations over two sentence forms ( §5). We show this manipulation generates outputs that are more similar to the desired form. We find that how we change activation values is important, but the effect is not localized: many neurons have to be modified to yield a noticeable effect. Lastly, we compare different methods for selecting subsets of neurons to be manipulated ( §6). Counter-intuitively, we find that neurons most correlated across paraphrases are better in controlling sentence structure, as opposed to those with the least correlation (i.e. where activation was most changed between different structures). We attribute that to generally important neurons and polysemy of neuron roles.
Overall, we find that strong correlations of neuron activation over paraphrases are explained by shallow features, the positional and token embeddings. Therefore, some neurons represent input features, but high-level information is not localized. Moreover, we show how the syntactic forms generated during inference can be naïvely controlled, but require a large amount of neurons to modify. This suggests that the distinction between different sentence structures is encoded in the model, probably in a distributed manner. Lastly, the neurons most effective for such manipulations are the ones most important for performance, not necessarily those that varies the most across paraphrases.

Dataset: Minimal Paraphrase Pairs
We aim to isolate representations of specific distinctions in sentence phrasing. To achieve that, we curate a dataset of sentence pairs with controlled syntactic variations. Specifically, we require sentence pairs with the following attributes: • Similar Meaning, to have invariant semantics. • Minimal Change, to facilitate the experimental setup and the interpretation of the results.
• Controlled Change, where paraphrasing is consistent and well-defined. As opposed to lexical paraphrases that tend to be idiosyncratic, we require the same distinction to be applied to all instances. • Reference Translation, since we examine translation models.
Existing paraphrasing tools and datasets fail to satisfy these criteria (see §7). Therefore, we develop our own paraphrasing method, which we use to compile two parallel sets: active voice to passive voice and an adverbial clause to a noun phrase. Sentence examples can be found in Table 1.
The proposed process is automatic, following predefined syntactic rules while utilizing several NLP models. First, we identify sentences that match some source patterns (active voice, adverbial clause) according to Dependency Parsing, POS tags (Honnibal et al., 2020) and Semantic Role Labeling (Gardner et al., 2018). Then, we rephrase the sentence to the desired structure. We complement missing prepositions by choosing the one with the highest probability as predicted by BERT (Devlin et al., 2019). For example, the sentence "She felt accomplished when she met the investor" requires the preposition "with" in the noun phrase form "She felt accomplished during her meeting with the investor", and the temporal preposition when is replaced with during. In ambiguous instances, we choose whether or not to insert a preposition by opting for the sentence with the higher probability according to GPT2 Language Model (Radford et al., 2019). When replacing a verb with a noun (e.g., arrival is replaced with arrive), we look for the most suitable conversion in existing lexicons, including Nomlex (Macleod et al., 1998), AMR's 2 and Verb Forms 3 . The fine-grained details and step-by-step procedure can be found in We apply our paraphrasing engine to the development set of WMT19 English-German (Barrault et al., 2019). Some paraphrases result in disfluent sentences. For example, the sentence "He took his time" is converted to "His time was taken by him", which is syntactically well-formed, but also anomalous. Therefore, we manually filtered the data. 5 The number of examples is given in Table 2.

Technical Setup
Model. We demonstrate our model-agnostic methodology with the Transformer model for Machine Translation (Vaswani et al., 2017). We use the fairseq implementation (Ott et al., 2019), which was trained on the WMT19 English-German train set (Barrault et al., 2019). The embedding dimension is 1024 with learnt token embeddings and sinusoidal positional encoding.
Notations and Definitions. For our purposes, neurons are any of the 1024 values in the output embedding as produced by each of the 6 layer blocks. We refer to trained models with different random initialization as m 1 , m 2 . We denote the set of source sentences S = {s 1 , s 2 , ..., s n }, and its corresponding paraphrased set with P = {p 1 , p 2 , ..., p n } (e.g., s i is an active voice sentence and p i is its passive counterpart). The activation of a neuron in model m, location (layer and index) l on sentence s i is x m,l S [i], while x m,l is a vector of size n. Following previous works analyzing Transformer models (Liu et al., 2019;Wu et al., 2020), we consider only the last sub-word token activation 4 The paraphrasing engine code and the dataset derived from WMT19 are available: https://github.com/ GalPatel/minimal-paraphrases 5 Two in-house annotators made binary predictions as to whether the generated paraphrases are fluent, with 75% observed agreement and 0.6 Cohen's kappa. We also tried using Direct Assessment (Graham et al., 2017) and eliciting fluency scores through crowdsourcing, as well as attempting to threshold the probability given by GPT2 or SLOR (Kann et al., 2018). Neither of these approaches worked in a satisfactory manner.
for each word. 6 Since the number of words may differ between paraphrases, we average activation values over the words in a sentence, to allow for a uniform sample size. 7 Dataset. For all experiments we use our minimal paraphrases dataset (see §2). Due to space considerations, we present results on the active/passive set in the main paper, while clause/noun phrase results can be found in Appendices B.2 and C.

Detecting Correlation Patterns
To detect activation patterns, we measure Pearson correlation 8 between neural activations. The correlation will allow us to examine how neurons activate under different conditions. First, we follow Bau et al. (2019) and define ModelCorr to be the correlation between any pair of neurons across models, when given the same input of source sentences: We use the correlation methodology to define a new analysis where the independent variable is the input instead of the model. We capture correlation across paraphrases, denoted with ParaCorr. Given the exact same model instance, we look at activations over a set of sentences and their correlation to the activations over the paraphrased set: Figures 1a and 1b show ModelCorr and ParaCorr correlation maps. 9 Some of ParaCorr's observed effect also appears in ModelCorr, suggesting it might be unrelated to the examined variable, i.e. paraphrases. Moreover, ModelCorr indicates a strong correlation between neurons of the same location in different models, but the Transformer architecture in itself does not account for positions.

Controlling for Confounds
In this section, we show that strong activation correlations between paraphrases are a product of lowlevel cues. Namely, we inspect how the propagation of token identity and positional information greatly influences the correlation. This is a relevant confound to note for previous work adapting correlation analysis on neurons (Bau et al., 2019;Wu et al., 2020;Meftah et al., 2021). The positional encoding in our setting is sinusoidal, therefore the same positions are encoded exactly the same across models. Paraphrases present a minor change in sentence length: 2.0±0.4 or 0.8±0.8 token difference in sentence length when paraphrasing active to passive or clause to noun phrase, respectively. The positional encodings are therefore similar. As for tokens, paraphrases have a large unigram overlap. We define PosCorr as activation correlation between sentences with identical positional encoding but different token embeddings. Formally (Ŝ is a set of random token sequences matching the lengths in S): Indeed, PosCorr isolates the strong correlation effect observed both in ModelCorr and ParaCorr (Fig. 1c). Repetition through the layers is probably due to the residual connections, which propagate the positional encoding. Indeed, when we looked at correlations of neurons inside the layer block -before the first residual connection -the effect seen in PosCorr was missing (see Appendix B.1). The implication is that input representation, and not higher-level learnt representation, is likely the cause of strong correlations. As input representation is composed of tokens and their positions, the counterpart correlation to PosCorr is TokenCorr, to account for token embeddings. We strip an input set S from its positional encoding, denoted byS, and compare its activa-tions to those of the intact S: TokenCorr ( Fig. 1d) captures the diagonals phenomenon of ParaCorr, explained by paraphrases having a large bag-of-words overlap (the effect is not present in ModelCorr since token embeddings are different across models). This implies that individual token identities, and not necessarily sentence-level semantics, contribute to strong correlations. This distinction is made apparent when we consider how word order may affect meaning. For example, "Rose likes Josh" has a widely different meaning than "Josh likes Rose", although the sentences have the exact same bag of words.
We further dissect the observed correlation for possible causes. First, we compare activations on different sentences that only share the relevant syntactic structure (e.g., two random active voice sentences). No strong correlation is observed (between -0.17 to 0.20). This suggests that the effect observed in the TokenCorr experiment, where the same tokens are fed to the model (Fig. 1d, Eq. 4) is not explained by a similar sentence structure (i.e., active voice). In another experiment, we combine both PosCorr and TokenCorr: we strip the original sentence from its positional embedding and replace the tokens with random ones -i.e., no input is shared between the compared conditions. As little correlation is detected (between -0.27 to 0.31), we rule out the possibility that the correlation is caused by neurons of constant value.
Overall, our confound analysis implies the following: (1) strong activation correlation is greatly due to low-level components and not high-level learned knowledge, (2) strong correlation detected across paraphrases may not be exclusive to sentences with similar meaning and different structure, and (3) sentence structure is not localized to a specific set of neurons in our analysis.
Being able to manipulate neurons allows us to control the translation output (without additional training), which in turn adds a causative dimension to our understanding of neurons. We look into changing the activation values to force the output translation to have a desired syntactic structural feature (e.g., active or passive voice). Although we did not detect individual neurons that have a strong positive or negative correlation specifically across paraphrases, these distinctions could still be encoded in a decentralized manner in the model, and therefore susceptible to manipulation. We address three main questions: 1. Can we effectively control structural properties of the output by changing neuron values? 2. Does the exact value matter or only the identity of the modified neurons? 3. How should we choose what sub-set of neurons to manipulate?

Setup
Our technique is a simple translation of the activation values towards the average activation of a desired syntactic structure. In doing so, we extend the approach of Bau et al. (2019), who modified individual neurons using their average activations. We denote with y l c the average activation of the neuron in position l under a condition c. The formulation is general, but we focus on paraphrase form, e.g., active voice, where y c The vector of average activations of all m neurons as recorded under condition c is denoted with y c ∈ R m . Manipulation from c 1 to c 2 is defined as 1 yc 1 −yc 2 (y c 1 − y c 2 ) ∈ R m . This vector defines a subtraction for every neuron, but may be applied to any subset of neurons. The normalization term is introduced to make manipulations comparable in size when all neurons are modified. We investigate two parameters: the direction of manipulation (from c 1 to c 2 ) and the set of neurons we apply it to. For an additional experiment on scaling the magnitude of the manipulation, see App C.5.
We evaluate whether manipulation increases the similarity of the output to a reference with the target form (c 2 ), relative to similarity with the source form (c 1 ). We measure BLEU score between our model's translation and Google translations, which (in the absence of manual references) we consider as references to both source and target forms. This is a reasonable assumption given the performance gap between the models we use and Google Translate. Later, we discuss evaluation by additional methods to complement BLEU (see §5.3).

Experiments
We present experiments manipulating passive voice inputs towards active voice translations. The opposite manipulation (active input to passive translation) and the results on the clause/noun-phrase set can be found in Appendix C.
Baseline Manipulation. We modify an increasing amount of neurons, choosing first the neurons most correlated to themselves according to Para-Corr (i.e., we rank by P araCorr(l, l) with the higher values first). The motivation to use the correlation as a rank is based on Bau et al. (2019), who ranked neurons according to their correlation across models. We manipulate passive voice inputs towards active voice translations. Our outputs become more similar to active voice than passive voice (Fig. 2a), suggesting that sentence structure is indeed encoded in the model. Moreover, the information is used by the model when generating translations and it can be controlled.

Direction of Manipulation.
We explore the importance of the manipulation direction by shifting towards a random vector y r ∈ R m (e.g., manipulate from average passive activation to a random value). We repeat the process 100 times and show the average results with standard deviation in Fig. 2b. We find it to be substantially worse, implying the success of the manipulation is tied to the direction we shift towards, and not an artifact of value modification.
Selection of Manipulation. We test whether there is a preferable subset of neurons to manipulate by randomly selecting what neurons to manipulate. 10 Results (Fig. 2c) do not indicate that a controlled selection of neurons (according to ParaCorr ranking) is better than random. Overall, it seems that a large subset of neurons has to be modified to obtain the desired outcome, which agrees with our correlation results, where the active/passive feature was not localized. The correlation between paraphrases can shed light on what sub-sets of neurons could still be better for manipulation, which we discuss later in section 6.  (c) we report the average of the measured BLEU and its standard deviation.

Beyond BLEU
BLEU score captures translation quality on the surface and not necessarily how good (or bad) it is at preserving meaning or capturing form (active vs. passive). Therefore, we employ additional evaluation measures.
Passive Score. Specifically for the active/passive dataset, we use a dependency parser and POS tagger to detect passive form 11 . The scorer shows a decrease of detected passive voice when we manipulate the passive input towards active translation (see figure 3b). The magnitude of the decrease may seem small, but the scorer have a limited recall: baseline translation of passive sentences (without manipulation) gets a score of 37.38%.
Qualitative Analysis. A native German speaker examined a sample of output translations and found successful manipulations (see appendix D). She discussed 'fail' cases -where the translation changed (i.e. unequal strings) but did not result in the desired form. In some cases, sentences changed between stative passive and dynamic passive, rather than between an active and a passive (the distinction between these passive types is more evident in German). In other cases, the manipulation was not applicable. For instance, some verbs could not be translated to an adverbial verb form and demand either to appear as a noun phrase or be replaced with a synonym verb (an example is in Appendix D). These suggest that the manipulation was successful even when not automatically detected as such, and is limited according to the target language and the 11 Using Spacy (Honnibal et al., 2020), we consider a sentence to be in passive voice if the root lemmatization is "werden" and it has a child of dependency "oc" (i.e., clausal object) with a tag indicating a participle form. model capabilities to generalize to synonyms while controlling the sentence structure.
Held-out test set. We repeated the manipulation experiment on a held-out test set: 552 sentences detected as active voice from the WMT19 test set. This allows us to examine if the successful manipulation effect extends to a setting where the manipulated sentences do not contribute to the measure of average activation of the source form. As can be seen in C.2, the manipulation still results in the desired change in passive form detection.

Significance of Neuron Selection
In our baseline manipulation in §5.2 we chose what neurons to modify according to the rank given by ParaCorr (i.e., sorting all neurons by P araCorr(l, l), high to low). Under an intuitive interpretation, neurons that positively correlate when systematic changes are made to the input are those invariant to that change. Neurons with a negative correlation are specific to the change. Following these, we expect that applying our manipulation on a set of neurons with the lowest rank would yield better results than top ranked neurons. Contrary to this intuition, we observe the opposite phenomenon, as seen in figure 3. We perform the following tests, in an attempt to explain this.
Model Performance. Going back to the foundations of our methodology, Bau et al. (2019) identified important neurons (in an LSTM) by ranking the most correlated neurons across models. To verify the "importance" notion, they delete (i.e., set activations to zero) neurons from the top versus the bottom of the rank and examine the impact on the model performance. We apply this experiment in our settings: we set to zero an increasing amount of neurons, according to ParaCorr. We measure BLEU on a held-out set of 552 active voice sentences, and their references, extracted from the WMT19 test set. Results (figure 4) show that top ranked neurons have a stronger impact on the translation quality than lower ranked do, suggesting that ParaCorr partially ranks neurons by their general importance. This might explain the above counterintuitive result.
Role Overlap. The top ParaCorr neurons are the same neurons that account for lexical identity and positional information. This fact explains why they have the most impact when manipulating sentence structure. Sentence structure is tied to word order, especially in active-passive, where the subject and direct object replace positions. Notably, when we tested active-passive sentences that are different in content and length, the phenomena did not repeat itself, see App. C.6. Word tokens are the building blocks for the semantic meaning of the sentence (which should remain the same across paraphrases), even when bag-of-words is not exclusive to a specific meaning. The first evidence to support this claim is seen in §4, where most of the strong correlations in ParaCorr are explained by similarity in the tokens and the positional embeddings between the inputs (i.e., TokenCorr and PosCorr, respectively). In an additional test, we check how many of the top ParaCorr neurons are also top PosCorr and TokenCorr neurons. Figure 5 shows that for any count x, the set of top x ParaCorr neurons have an intersection with the sets of top x PosCorr or TokenCorr neurons.  Analysis in other domains. Some Computer Vision work resembles our approach. Lenc and Vedaldi (2015) study the interaction between input transformation and its representation along the layers, while Goodfellow et al. (2009) examine invariant neurons, those that are selective to high-level features but are robust given semantically identical transformations. Their methodologies do not fit the NLP domain since they rely on a mathematically well-defined input transformation (e.g., rotation). We propose an alternative with our paraphrases in §2, and thus we analyze the relation between input and representation. In the field of neuroscience, linguistic encoding in the human brain has been studied using a methodology similar to ours: Friederici (2011) analyzed the correlation of neuroimaging where subjects are presented with sentences with subtle syntactic variations or violations, and found that well correlated regions are considered to process syntax. In another study (Fedorenko et al., 2016), human subjects were presented with various inputs, which are analogous to our correlation experiments: words lists (TokenCorr), meaningless grammatical sentences (PosCorr), non-words lists (combination of TokenCorr and PosCorr) and regular sentences (ParaCorr).
Individual neurons analysis using correlation.

Conclusion
With our curated dataset, we introduced a modelagnostic methodology to detect activation patterns across paraphrases. By a meticulous confound analysis, we found that activation similarity is likely due to shallow features of sequence length or word identity, which are not exclusive to meaningpreserving variations. We emphasize how these confounds must be taken into account when attempting to detect local correlation under any experimental setup. We controlled sentence structures of generated output, which provides evidence to the ability of models to capture them. While we found the modification technique to be important for manipulation success, selection of a subset of neurons was more challenging. Future work should test additional architectures and language pairs or examine the representation significance of our paraphrase pairs in other NLP tasks.

A.1 Tools and Techniques
We explain, in greater detail, the main tools we use when we paraphrase, as briefly discussed in section §2.
Pattern Detection. Making sure we change form but not semantics, we rely on syntax patterns, not word-based. We use dependency parsing (including Part of Speech tagging) and Semantic Role Labeling combined (by Honnibal et al. (2020) and Gardner et al. (2018), respectively) to detect active form and adverbial clauses by type (see table 3).
Sentence Probability Used for choosing between two sentence options (with or without a certain preposition?). We use gpt2  and get y ∈ R n , a probability vector for each word in the vocabulary (d is the size of the vocabulary of the BERT model) to be a new word at position i of sentence X (the word with the highest probability, according to BERT, out of the given set W ). Output: a sentence (x 1 , x 2 , . . . , x i−1 , w k , x i , . . . , x n ) We can make this an Optional Word Insertion by returning either the input or output sentence, using Sentence Probability.

A.2 Active Voice to Passive Voice
The active to passive paraphrasing process is done on sentences that include a nominal subject and a direct object. We discard any sentence of question and coordination, possible passive form (root verb is in past participle), and when the root verb has a "to" auxiliary.
1: If the subject is a proper noun, convert it to object form 2: If the direct object is a proper noun, convert it to subject form 3: Switch the subtree spans of subject and object 4: Add "by" just before the span of the new object 5: If an auxiliary verb is one of "can", "may", "shall", convert it to "could", "might", "should" respectively. 6: If root verb is a gerund or present participle, replace it with "being". Otherwise, remove it altogether. 7: Add suitable auxiliary according to the new subject form of singular/plural, and the tense. 8: If the sentence includes a negation word, remove it and add "not" before the auxiliary. 9: Replace the root verb to its past participle form (using the Verb Forms Dictionary 14 ). 10: If the sentence includes a particle, move it after the root verb. 11: If the sentence includes a dative, try to replace it using Optional Word Insertion.
We'll go over an example: Active to Passive: example Input: He can't take the book.
The complete process of paraphrasing a sentence with an adverbial clause to one with a noun phrase substituting it is detailed in table 3. We'll demonstrate a few examples .

Purpose clause
Input: She sat under the sun to enjoy the warmth.
Cause/Reason clause, possessive form Input: She was at the library for a long time because she had an unresolved problem.
1: Extract "because she had an unresolved problem" 2: Found matching root "had" and a marker "because" 3: Remove "had" 4: Remove "an" 5: "she" ← "her" 6: "because" ← "because of " 7: NA Output: She was at the library for a long time because of her unresolved problem.

Cause/Reason clause, non-possessive form
Input: This robot is very advanced because it flies itself.

Possession
Nominal subject to possessive form replace "because" If "as"/"while"/ Replace "to" with "because of" "when", replace by with "for" 6. Preposition Word Insertion A.1 a If negation, If there is a direct object b 7. Additions add "lack of" Optional Word Insertion A.1 c a Using temporal prepositions set. b If there is a direct object of the form "<xxx>self" in the non-possessive cause/reason case, we instead add "self" before the derived noun and remove this object. c Using general prepositions set. Table 3: The paraphrasing process from adverbial clause sentence to a noun phrase. 5: "it" ← "its" 6: "because" ← "because of " 7: "f light" ← "self f light" Output: This robot is very advanced because of its self flight.

B.1 Inside the Layer Block
In section 4 we measure the correlation of activations only at the output of the encoder layer block, following previous work (Wu et al., 2020). We also take a look at intermediate activations, see figure 6. This strengthens our hypothesis that the strong correlation seen in PosCorr (figure 1c) is due to the sinusoidal positional encodings, as they are propagated through the network with residual connections. The PosCorr effect appears only after the first residual connection, weakens through the fully-connected layers, and strengthens again after additional residual connection.

B.2 Adverbial Clause versus Noun Phrase
Here we present the same correlations methods detailed in section 4but measured on the adverbial clause versus noun phrase sets. See figure 7.

C.1 Active to Passive
To complete all variations of the manipulation experiment, we first showcase the shift from active voice input to passive voice translation (the opposite direction than what we showed in the paper). We see that the translation is more similar to the target form (passive voice) than the input form (active voice). The positive change in BLEU is more subtle in this manipulating, and again getting maximal change requires many neurons to be modified (at least 50%), see Fig. 8a. With the random experiments of direction (Fig. 8b) and neurons selection (Fig. 8c), we get similar results -our controlled direction is better while choosing a subset of neurons is not easy.

C.2 Manipulation on a Test Set
We repeat the manipulation on a held-out test set: 552 sentences that we detect as active voice from the WMT19 test set. While our experiments on the dev set are valid, as we manipulate from one set (e.g. passive voice) by measuring on another (e.g. active voice), one might argue that we can't know the effect the shared semantic meaning (on the set level) has on the success rate. To cover all bases, we manipulate the test set according to average activations measured on the dev set. Here we do not have a passive voice counterpart, so we manipulate active voice inputs to passive voice translations. The passive voice detection score (see §5. 3) shows a monotonous increase (up to 0.6% more) as we modify more neurons (see figure 9). The trend matches our expectations. Moreover, we see again that manipulating top ranked neurons (rank given by ParaCorr) has a greater effect than bottom ranked ones. This is again consistent with what we saw with the development set and BLEU score in section 6.

C.3 Noun Phrase to Adverbial Clause
Manipulating from a noun phrase to an adverbial clause is consistent with the results we saw for passive to active manipulation, see Fig. 11 We repeat the same succession of experiments on the adverbial clause versus noun phrase dataset.

C.4 Adverbial Clause to Noun Phrase
Manipulating neurons to convert input with an adverbial clause to output translation with a noun phrase is not outright successful. In the controlled case (where we employ direction by our records of average activation of each paraphrase form and select an increasing set of neurons to manipulate according to top or bottom ParaCorr rank), we are still closer to the clause form than noun phrase. We propose several possible explanations: 1. The clause versus noun phrase dataset is substantially smaller than the active versus passive one (114 examples compared to 1,169 instances). A small dataset may include more noise or simply make the target syntactic form harder to capture.
2. Adverbial clause form may be more common in the train set so the model regularizes to the  Comparing the effect of manipulating first top versus bottom neurons, according to ParaCorr. We measure passive form detection statistically more acceptable option. We see hints for that when we compare the manipulation towards active form as more successful than passive form ( §5 and Fig. 8).
3. Noun phrase form may not be distinctive enough to be encoded in the model.
4. The target form may not be natural in the target language. As we discuss in our qualitative analysis in section 5.3, fail cases revealed instances where the target form was either not possible for a native German speaker, or required replacement of the verb to a synonym. This replacement demands another layer of manipulation from the model, one that it may not even know to generalize.

C.5 Manipulation Magnitude
The manipulation operation defined in 5 is normalized, then applied to chosen neurons. The reason is for different manipulations to be comparable in size. Another manipulation parameter to experiment with is a scalar α to re-scale with, i.e. manip-(a) Passive to Active (b) Active to Passive Figure 10: Comparing various magnitudes α for manipulation α yc 1 −yc 2 (y c1 − y c2 ) ∈ R m . BLEU score measured against reference of target form, when manipulating increasingly more neurons according to top rank of ParaCorr. ulation from c 1 to c 2 is defined as subtraction of α yc 1 −yc 2 (y c 1 −y c 2 ) ∈ R m . We experimented with a small grid search for alpha values without an apparent option being better than the baseline (α = 1). See figure 10 for results 15 . There is no definitive conclusion of what magnitude would be consistently better in every manipulation. Similar trends were found in the clause dataset: α = 2 was best when manipulating from paraphrased form noun phrase back to original form of adverbial clause, and worse in the other way around. This could be tied to the general effect we see in §C.4 that there is one direction of manipulation more effective, which is changing from paraphrased form to original form and should be further investigated in future work.

C.6 Unparalleled Sentences Manipulation
As seen in §6, top ParaCorr neurons were better for manipulation than those from the bottom of the 15 We experimented with even greater values (α ∈ {5, 10, 100, 1000}), each with a more drastic BLEU drop, therefore we discard their inclusion in the figure to allow the y-axis range to capture the subtle trends of the variables presented. rank. One possible explanation we introduced was the fact that many of those top ParaCorr neurons are also top PosCorr and TokenCorr neurons. Therefore, the effectiveness might be derived from the role polysemy of these neurons, especially when the paraphrasing calls for a change of word order (e.g. active-passive requires subject and object swap) or token identity (e.g. clause to noun phrase requires a transformation between verb and noun). This is true for cases where the paraphrases are parallel pairs, therefore they share those shallow features (tokens and word order).
Here we present an experiment of manipulation where there are no parallel pairs of paraphrases. We randomly split the sentence pairs into two sets. From one we take only the active sentences, and from the other we take only the passive sentences, resulting of active set and passive set of unrelated sentences. We repeat the manipulation experiment as detailed in §5.2 but use those unparalleled sets for measuring the averaging activation of neurons under the active voice feature and under the passive voice feature (i.e., for measuring y l c ). We do so for (a) Passive to Active (b) Active to Passive Figure 13: Comparing the impact of manipulating top ParaCorr neurons versus bottom ones, where the modification value of neurons is determined by average activation under unparalleled sets of sentences, i.e. y l c is measured by active and passive non-pairs. The lines represent mean and standard deviation over 100 different unparalleled sets. 100 different splits of the data. Measuring the mean and standard deviation of the BLEU against the objective reference, the results are presented in figure  13. Notably, the standard deviation of the experiment is reported, not the standard error of the mean. The results may match the intuition where the least correlated neurons between paraphrases are those most sensitive to the active-passive feature, but since nothing is shared across those sentences, the expected noise level is high, and any measure is hard to explain.
When we measured correlation of such unparalleled sentences, we got average correlation (per neuron over 100 different splits of the dataset into unparalleled sets) ranging from −0.04 to 0.04, with standard deviation between 0.03 to 0.06.

D Qualitative Analysis of Manipulation
Sentence examples of successful manipulation from passive voice input to active voice translation, as examined by a native German speaker, can be found at table 4.
As we discuss in §5.3, sometimes a manipulation is not applicable in the target language. For example, the adverbial clause sentence from our dataset "In Lyman's case, she reported the alleged rape to military police less than an hour after it occurred.", is translated into a noun phrase sentence regardless of input form (i.e. if we insert either this as input or its noun phrase paraphrase) or manipulation (i.e. with or without manipulation). "it occurred" is immediately translated into the German parallel of "its occurrence" when translating the clause version, and it is translated into a wrong noun phrase when translating the noun phrase version (the German parallel of "appearance" rather than "occurrence" in this context , i.e. "Auftreten" and "Vorfall", respectively). A native German speaker suggested we opt to replace "occurred" with "happened", otherwise it could not be translated to a clause form. Even the human reference (of WMT) is with the "its occurrence" noun phrase. During the excavations, the remains of a total of five creatures were collected by them.
From "dream" to "megalomania": the Bit Galerie is discussed by TV readers Vom "Traum" zum "Größenwahn": Die Bit-Galerie wird von TV-Lesern diskutiert Vom "Traum" zum "Größenwahn": TV-Leser diskutieren über die Bit-Galerie Table 5: Example of adverbial clause and noun phrase translations, showcasing the limitations of BLEU comparison to Google Translate references and the challenge of translating an output in adverbial clause form. Either manipulation here did not have any effect (e.g. manipulation from clausal input resulted in translation identical to the one without manipulation) English

Adverbial Clause
In Lyman's case, she reported the alleged rape to military police less than an hour after it occurred.

Noun Phrase
In Lyman's case, she reported the alleged rape to military police less than an hour after its occurrence.