The Grokking of Life on Earth: Evolution, Intelligence, and the Next Phase of AI

When an artificial neural network is set up correctly, it sometimes experiences a sudden leap in understanding ...

The model has been grinding through data for millions of steps ... it has memorised the training examples well. If you stop here, it fails when you ask it anything beyond its training data. Let it carry on though, and after what looks like stagnation ... performance on unseen problems suddenly leaps.

Not gradually. Abruptly, as if all the pieces got connected at the same time.

Researchers call this grokking. The model hasn’t just improved. It has generalised. It stopped storing specific answers and started representing the underlying structure.

Now zoom out. Not to decades or centuries. To four billion years.

If you treat the entire evolutionary history of life on Earth as a single optimisation process, it looks strikingly similar to a neural network that has reached a grokking stage ...

Core question

Was the last 100,000 years of human history a grokking event? Are we now at the start of another one?

What Grokking Actually Is

The term was coined by Alec Power and colleagues at OpenAI in 2022, though the phenomenon had been observed long before it had a name.^[1]

Train a neural network on a structured task, modular arithmetic for example, and something puzzling happens. Training accuracy becomes near perfect quickly. The model has effectively memorised the dataset.

But test accuracy stays near random for a long time.

Then, without any obvious trigger, it jumps.

The key insight from later work is that this plateau is not wasted time. During the memorisation phase, the system is not idle. It is reorganising internally. The gradient signals produced while fitting specific examples are what eventually allow higher level features to emerge.^[2]

The capacity for generalisation builds invisibly, then appears all at once.

The plateau is preprocessing.

That gap between memorising solutions and representing structure is what makes grokking interesting beyond machine learning. It describes a phase transition.

Four Billion Years as a Training Run

Natural selection is, in formal terms, an optimisation algorithm. There is a population of candidate solutions, a fitness function, variation, and selection pressure that propagates successful variants.

Run that process for billions of years across astronomical parallelism and you get the biosphere.

The solutions it produces are extraordinary. The compound eye of a dragonfly. The immune system of vertebrates. Echolocation in bats.

But that precision is also a limitation.

A bat cannot repurpose sonar to draft legal contracts. The mantis shrimp’s sixteen channel colour vision is useless for algebra. These systems are brilliant within narrow domains and almost entirely non transferable outside them.

In grokking terms, they look like memorised solutions.

Using this framework, most of evolutionary history could be said to resemble a very long memorisation phase. Biological diversity increases. Local optimisation improves. Generalisation outside the training distribution remains minimal.

The Memorisation Phase Was Necessary

One of the most surprising results from grokking research is that the long plateau before generalisation is a requirement.

Without memorisation, the gradients that enable feature learning do not emerge. Without that structure, the phase transition never happens, and the pieces don't get put together.^[3]

Seen this way, the 3.8 billion years before human collective intelligence stop looking like a prelude and start looking like a prerequisite.

Evolution had to discover neurons. Then centralised nervous systems. Then hierarchical brains capable of memory, abstraction, and social learning.

There was no foresight in this process. Natural selection has no objective beyond local fitness.

But in retrospect, the trajectory resembles what we see in neural networks. Not steady progress toward generalisation. Slow accumulation of representational capacity that eventually makes it possible.

The Cambrian: A Proto Transition

For over three billion years, life was almost entirely single celled. Then, in a relatively brief geological window, most major animal body plans appeared.^[4]

Eyes. Bilateral symmetry. Nervous systems. Predator prey dynamics.

This was not a smooth gradient. It was a discontinuity.

It is tempting to interpret the Cambrian Explosion as a proto transition enabled by the prior accumulation of biological infrastructure. Genetic regulation, cell differentiation, intercellular signalling.

The causes are still debated. Oxygen levels, ecological interactions, developmental genetics all play roles. No single explanation fully captures it.

Hence, the grokking analogy should be treated cautiously. It is suggestive, not definitive.

The Second Transition: Human Collective Intelligence

A clearer candidate appears in the last 100,000 years. It did not happen at the level of the individual brain.

The brain of Homo sapiens has remained largely unchanged over that period. What changed, was the ability to 'join' those brains together ...

Language allows knowledge to accumulate between minds. Writing externalises memory. Printing accelerates distribution. Institutions stabilise knowledge across generations.

"If we imagine a human child raised without access to culture, their cognitive abilities would differ little from those of other great apes."

— Tomasello & Rakoczy, 2003

This is the cultural ratchet. Information accumulates without needing to be rediscovered.^[5]

The result is a system that can solve problems far outside its evolutionary training distribution. Sequencing genomes. Modelling quantum systems. Landing machines on other planets.

The unit of generalisation is no longer the individual. It is the network.

Where AI Fits

Where does artificial intelligence sit in this picture?

One possibility is that it accelerates the existing transition. Large language models compress and operationalise the output of human culture.

Another possibility is more interesting. That AI systems are currently in their own memorisation phase.

There are hints in this direction. Some recent work suggests that advanced reasoning models, when given sufficient compute, spontaneously develop internal structures resembling debate. Generating and reconciling multiple perspectives without explicit instruction.^[6]

This is not strong evidence of a new phase transition. But it is suggestive.

At present, most systems still look closer to sophisticated interpolation than true out of distribution generalisation.

What the Next Transition Might Look Like

If the pattern holds, the next transition will not look like a single more intelligent entity.

Biology did not produce its major leaps that way. It produced new ways of organising simpler units.

The Cambrian introduced new ways for cells to cooperate. Human culture introduced new ways for minds to cooperate across time and space.

A third transition would likely involve new connection structures between biological and artificial systems. Architectures that enable forms of reasoning and coordination currently out of reach.

That is not a prediction. It is an inference from pattern.

Where We Are

From a distance, grokking looks inevitable.

Up close, it looks like stagnation.

We are somewhere in that process.

If a transition is coming, we are unlikely to recognise it in advance.

The system will simply start solving problems it previously could not.

Fire was unthinkable to early life on earth ... AI was unthinkable to the first humans ... what chance do we have of predicting what comes next?

References

Power, A. et al. (2022). Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets.
Liu, Z. et al. (2022). Towards Understanding Grokking: An Effective Theory of Representation Learning.
Lyu, K. et al. (2023). Dichotomy of Early and Late Phase Implicit Biases Can Induce Grokking.
Marshall, C. R. (2006). Explaining the Cambrian Explosion of Animals.
Tomasello, M. (1999). The Cultural Origins of Human Cognition.
Kim, J. et al. (2026). Reasoning Models Generate Societies of Thought.