Compression in neural networks

07 Mar, 2026

There is a perspective in deep learning that tries to capture the important weights for a task, obtaining so a smaller model with the same level of performance for a given task.

Professor Atlas demonstrates the famous definition of a genius. That talent hits a target no one else can hit; genius hits a target no one else can see. He states the following.

For a pruning algorithm, you can probably compress by 50%. For a low-rank algorithm, if you are doing fine-tuning, you can compress by maybe hundreds of times compared to using a full matrix. But what I would argue is that the biggest compression of a neural network is to compress a neural network into a non-neural network. That means: compress what you have learned into knowledge that you can write with symbols — like what you have read in a textbook. I would argue that human knowledge that can be written and recited — like we're doing now — is the best form of compression, and that will not be a neural network.

So that's the goal I set for myself. I don't want to compress a neural network into another neural network — distillation, low-rank, sparse, that told me a lot of things — but now I want to compress them into a discrete form, a symbolic form, and eventually maybe even human-readable language again.

That is really great. That is learning.