
“Neural networks are grown, not programmed”. This paper changes that. Mechinterp investigates how models generalize beyond their training data by studying the resulting internal structure. They introduce patterning as the dual: given desired structure, determine what data produces it.
This is done with the language of susceptibilities. In physics, susceptibilities measure how a system responds to perturbations. Here, we think of the neural network as such a system, and of shifts in the training distribution as such perturbations.

This is a small language model (3M) across training, visualised with a new interpretability technique: susceptibilities. We call this handsome critter the rainbow serpent.
In a synthetic parentheses balancing task, we show that, given two solutions that both achieve perfect training accuracy and loss, we can effectively steer the solution that the model chooses to implement. We do this using only in-distribution data.
This is closely related to, but distinct from influence functions and training data attribution. These study the effects of data at the behavioral level, such as the impact of a data point on test loss, whereas patterning is concerned with the structure underlying that behavior.
Sources: tweet, paper, NN are grown tweet