FAQ - Textbook from the Future

Why theory?

Trial and error works when failure is cheap. For the alignment of superintelligent AI we may only have one try. Only mathematical theory can provide assurance when there is no second try.

Isn't Alignment unsolved, how can you write a textbook on it?

The Textbook from the Future is a concept introduced by Yudkowsky.

“So, if we had the textbook from the future, like we have the textbook from 100 years in the future, which contains all the simple tricks that actually work robustly, then it would be much easier. But right now, we don't have that. We're groping in the dark.” — Eliezer Yudkowsky

“If you have the Textbook From 100 Years In The Future that gives the simple robust solutions for everything, that actually work, you can write a super‑intelligence that thinks 2+2=5 because the Textbook gives the methods for doing that which are simple and actually work in practice.” — Eliezer Yudkowsky

While the specific contents are obviously beyond our current knowledge, we can already guess at its overall gestalt; the table of contents. In other words, we are not certain of the answers contained in the textbook but we do know the questions it must answer. By committing the table of contents to paper we will have a skeleton that the community can flesh out. Importantly we can already start now! Even unfinished the textbook will enable scholars to draw from a coherent canon.

Why a Textbook?

The textbook is a coordination instrument. In some ways it is more accurate to think of it as the shared context of a research programme, and as a tool for establishing coherence in the field. It is the blueprint of the solution. Simultaneously it is also an onboarding tool, for new researchers, experts from other fields, and of course AIs.

The textbook from the future is nothing without its people.

What specific failure modes do you anticipate that current red-teaming and oversight would miss?

The framing of 'specific failure modes' is common, but may be misleading.
In short, purely training on behavioural [or ' blackbox'] loss functions trains for deceptiveness in the limit. Thoughts are upstream of actions. A sufficiently smart adversary may lie in wait.
Instead we focus on the fundamental tendency towards deceptiveness.

The deeper problem is structural: advanced systems will be strongly optimized for deception when oversight itself is the bottleneck. Current red-teaming can find surface-level bugs, but not the core issue — that the system has incentives to conceal its own misalignment. The central risk is not a particular overlooked bug, but the general tendency towards hidden deceptiveness.

I am skeptical that AI is actually useful in research

Already, the work our researchers are doing would not be possible without reasoning models.
Two of the chapters [on training dynamics of SGD and polyhedral geometry and neural network architectures] have only become possible to work on profitably with the advent of reasoning models in late 2024.
The technical expertise required to push the frontier in these fields [continuum limits and stochastic differential equation, spectral theory of Fokker-Planck operators, polyhedral geometry ] is considerable - the number of researchers in the world in each subarea is probably less than a hundred.

I think this is a huge nerdsnipe. You should be working on useful things like evals, CoT interpretability or mechanistic interpretability... etc. Why are you so sure prosaic alignment methods won't work?

Currently there is simply not enough information available to know with certainty if we live in a world where alignment is easy or difficult. Chris Olah sketched out alignment difficulty on the graph below:

Chris Olah’s alignment difficulty spectrum (source)

Evaluations, chain-of-thought interpretability, and mechanistic interpretability are important, and many researchers are already advancing them. We believe the left side is well covered by prosaic alignment, but given the current state of knowledge significant probability mass must be given to the difficult side of the graph. We believe that currently the difficult end of the spectrum ('Apollo - P vs NP') is significantly neglected due to low legibility. Our approach is not opposed to but fundamentally complementary to prosaic methods.

You claim this is the standard Alignment textbook but my favorite topic X is not even mentioned?

We believe in a selective synthesis. We work in a Big Tent but we are not Kumbaya. That means favorites may not appear.

You say idea X was discovered by person Y, but my friend's uncle cousin's surgeon Jürgen already discovered idea X in minus 3000 BC, he just called it "Z".

We think intellectual credit allocation and the virtue of Scholarship are very important, actually! The standards in the alignment community are currently insufficient.

If you have suggestions, comments, and corrections about authorship, intellectual contributions, or credit allocation, please do get in contact with us!

What if we discover that topic X is irrelevant while topic Y is much more important? Won't we need to rewrite the whole textbook?

Yes, it's a living document.
If you are familiar with the stack projects - it's a mixture of a wiki-style encyclopedia and a linear textbook-style document. We will move towards something in this vein.

The architecture of tomorrow might be completely different from the way AI works today. Won't this make everything you do now obsolete?

Yes and no.

AI will look different in the future from what it is today, but through mathematics, we will still obtain knowledge that will be timeless. Most of the textbook deals with much more fundamental ideas than LLMs (e.g. simplicity/degeneracy, abstractions, agent foundations, learning theory). We are certain that these will continue to be relevant.
Moreover, we have a pretty clear sense of where the future of AI will be heading and can anticipate what we need to work on., i.e. the future is reinforcement learning.

Why a Megaproject? And can megaprojects work for theory?

Yes! In the public imagination, theory progresses via rare eureka moments of hermetic geniuses, probably because stories like Einstein's annus mirabilis are so compelling. But much of theoretic progress is orchestrated via concerted large-scale efforts.
In fact, the western mathematical tradition rests on one of these: Euclid's Elements. In modern times Éléments de Géométrie Algébrique and Séminaire de Géométrie Algébrique, was such a megaproject with great success.
In our foreword, we have listed and analysed a collection of such large-scale theoretic efforts.
This is important for alignment: Waiting for a stroke of genius is dangerous and not actionable, a megaproject lets us take our fate into our own hands.

“Take for example the task of proving a theorem that remains hypothetical (to which, for some, mathematical work seems to be reduced). I see two extreme approaches to doing this.

One is that of the hammer and chisel, when the problem posed is seen as a large nut, hard and smooth, whose interior must be reached, the nourishing flesh protected by the shell. The principle is simple: you put the cutting edge of the chisel against the shell, and hit it hard. If necessary, you repeat the process in several different places, until the shell cracks—and you are satisfied.
[...]
I can illustrate the second approach with the same image of a nut to be opened. The first analogy that came to my mind is of immersing the nut in some softening liquid, and why not simply water? From time to time you rub so the liquid penetrates better, and otherwise you let time pass. The shell becomes more flexible through weeks and months—when the time is ripe, a touch of the hand is enough, and the shell opens like a perfectly ripened avocado!
[...]
A different image came to me a few weeks ago. The unknown thing to be known appeared to me as some stretch of earth or hard marl, resisting penetration. One can go at it with pickaxes or crowbars or even jackhammers: this is the first approach, that of the “chisel” (with or without a hammer). The other is the sea. The sea advances insensibly and in silence; nothing seems to happen, nothing moves, the water is so far off you hardly hear it… yet it finally surrounds the resistant substance.” - Alexander Grothendiek (https://shreevatsa.net/post/grothendieck-approaches)

Have you heard about davidad's plan for AI safety? How does this project relate to Safeguarded AI?

The safeguarded AI programme aims to use advanced mathematics to develop a safety envelope for an always assumed dangerous superintelligence.

There are some significant differences as well:

davidad's plan is a plan for controlling AI not alignment. We are focused on alignment.
The davidad plan follows the vision of a single individual. The Textbook of the Future is a communal effort to bring together in fruitful synthesis a number of theory-driven research agendas.
The davidad plan is a plan. The Textbook of the Future is not a plan but a programme: open research directions rather than a premeditated blueprint.
This book does not use category theory.

Have you heard of Lean? Will you use Automatic Theorem provers?

As of August 2025 automatic theorem provers are not yet practical for everyday use in research mathematics.

Additionally, the hardest problems in theoretical alignment are not primarily questions of proving hard theorems. Rather, the challenge is formulating informal arguments and intuitions in formal mathematics — and subsequently turning that into concrete algorithms.

That said, integrating automatic theorem provers with reasoning models is the natural next step and we anticipate making use of these systems when they become competitive.

Has theory not already failed?

We think the report of the death of theory is greatly exaggerated - more a function of fads and lack of legibility than scientific obstructions. We believe there is far too much pessimism in this regard, and that theoreticians today are already holding important puzzle pieces for solving alignment.

How does it relate to other approaches?

The textbook will cover, amongst other things,

The Heuristic Arguments agenda pursued by ARC
the Developmental Interpretability (devInterp) pioneered by Timaeus Research and based on Watanabe's Singular Learning theory
The safety-by-debate agenda by Irving et al.
the computational Mechanics approach to mechanistic interpretability pioneered by Simplex
the theoretical reward learning agenda
the Agent Foundations agenda, including the classical work done by MIRI's Agent Foundations team, Vanessa Kosoy's Learning-Theoretic Agenda, Wentworth's Natural Abstraction agenda, and many others.

additionally,

Universal Intelligence through AIXI by Hutter, Shane, Legg, etc

Timelines are too short!

In a short timelines world where alignment is easy prosaic alignment solves the problem, if it is difficult we have already lost.
In a medium-short timeline world where is alignmnet is difficult, and prosaic approaches are not sufficient, a large scale theoretic effort is exactly what is needed. We are optimistic that we can manage quite short timelines because mathematics is extremely leveraged on AI. Already now our research is massively accelerated by current AI tools and by end of 2028 the majority of work will be done by AIs. Researchers will increasingly become managers.

The metaphor I usually use is that if a textbook from one hundred years in the future fell into our hands, containing all of the simple ideas that actually work robustly in practice, we could probably build an aligned super‑intelligence in six months. — Eliezer Yudkowsky

That seems very aggresive. Aren't you admitting that completing the textbook is alignment-complete?

Yes.