Who Said Neural Networks Aren't Linear?


TL;DR

This paper proposes a new architecture called Linearizer. By placing a linear operator between two invertible neural networks, it makes traditional nonlinear mappings behave as strict linear transformations in a specially constructed vector space, thereby enabling powerful tools from linear algebra, such as SVD and the pseudoinverse, to be applied to deep learning models.

Key Definitions

The core idea of this paper is to redefine the vector space so that nonlinear functions become linear on it. The key definitions are as follows:

  1. Linearizer: A composite-function architecture of the form $f(x) = \mathbb{L}_{{g_x, g_y, A}}(x) = g_y^{-1}(A g_x(x))$. Here, $g_x$ and $g_y$ are invertible neural networks, and $A$ is a linear operator (matrix). This architecture is nonlinear in the standard Euclidean space.

  2. Induced Vector Space Operations: New vector addition and scalar multiplication defined based on an invertible network $g$. For vectors $v_1, v_2$ and scalar $a$, the operations are defined as:
    • Vector addition: $v_1 \oplus_g v_2 := g^{-1}(g(v_1) + g(v_2))$
    • Scalar multiplication: $a \odot_g v_1 := g^{-1}(a \cdot g(v_1))$ With these operations, the set $\mathbb{R}^N$ and the scalar field $\mathbb{R}$ form a new vector space $(V, \oplus_g, \odot_g)$.
  3. Induced Inner Product: A new inner product defined in the same way based on an invertible network $g$, making the induced vector space a Hilbert space. It is defined as:

    \[\langle v_1, v_2 \rangle_g := \langle g(v_1), g(v_2) \rangle_{\mathbb{R}^N}\]

where the right-hand side is the standard Euclidean inner product.

Current neural network models are well-known nonlinear models. While this gives them strong expressive power, it also prevents them from leveraging the rich and elegant theoretical tools of classical linear algebra. In linear systems, operations such as eigendecomposition, inversion, and projection have clear structure and theoretical guarantees, and iterating a linear operator also simplifies the problem. But in nonlinear systems, these tasks become extremely complex, usually requiring specially designed loss functions and optimization strategies, and the results are often only approximate.

The core question this paper aims to address is: can we reinterpret a nonlinear model as a linear operator without sacrificing its expressive power? If so, then we could directly use the full toolkit of linear algebra to analyze and manipulate these complex nonlinear models.

Method

Architecture

The core method proposed in this paper is the Linearizer architecture. Its structure places a linear operator (matrix $A$) between two invertible neural networks $g_x$ and $g_y$:

\[f(x) = g_y^{-1}(A g_x(x))\]

Here, $g_x$ maps the input data $x$ into a latent space, $A$ performs a linear transformation in that latent space, and then $g_y^{-1}$ maps the result back to the output space.

Figure illustration

The Linearizer structure (top) is a linear operation sandwiched between two invertible functions. (Bottom) Vector addition and scalar multiplication define an induced vector space in which f is linear.

Innovations

The essential innovation of Linearizer is that it proves the function $f$ is strictly linear in the new vector space induced by $g_x$ and $g_y$. Specifically:

  1. Constructive Linearization: By defining new addition $\oplus$ and scalar multiplication $\odot$ operations, the paper constructs new vector spaces for the input and output. The input space $\mathcal{X}$ is defined by $(\oplus_x, \odot_x)$ (based on $g_x$), and the output space $\mathcal{Y}$ is defined by $(\oplus_y, \odot_y)$ (based on $g_y$). Under this framework, the function $f$ satisfies the superposition principle, i.e., it is proven to be linear:

    \[f(a_1 \odot_x x_1 \oplus_x a_2 \odot_x x_2) = a_1 \odot_y f(x_1) \oplus_y a_2 \odot_y f(x_2)\]
  2. Geometric Intuition: The invertible mapping $g_x$ can be viewed as a diffeomorphism that “straightens” the curved manifold in data space into a flat space in latent space. Therefore, transformation paths that are complex in data space become simple straight lines in latent space.

Advantages

This linearized construction endows the model with a series of powerful algebraic properties, which can be realized directly by operating on the core matrix $A$:

Experimental Conclusions

The paper demonstrates the practical utility of the Linearizer framework through three applications.

One-Step Flow Matching

Figure illustration

The images on the left and right (in red) are the original (non-generated) data $x\_1$ and $x\_2$. The middle image is obtained by interpolation in latent space.

Quantitative Comparison Results

Dataset Inversion-Reconstruction Consistency (LPIPS) 100-step vs 1-step Fidelity (LPIPS)
MNIST 31.6 / .008 32.4 / .006
CelebA 33.4 / .006 32.9 / .007

Note: The two values in the table represent LPIPS and PSNR, respectively. Lower LPIPS is better.

Modular Style Transfer

Figure illustration

Left: original image. Middle: style transfer using the left and right style images. Right: interpolation between the two styles.

Linear Idempotent Generative Networks

Figure illustration

The black solid arrows indicate forward propagation; the red dashed arrows indicate backpropagation. Our linear IGN constructs a global projector that can project any input onto the target distribution. The top shows the input, and the bottom shows the matched output.