AI Safety Fundamentals
BlueDot ImpactFree, structured courses on alignment and governance. The single best on-ramp; several of us went through it ourselves.
The papers, courses and books we keep coming back to in sessions. You don't need to read everything — pick the section that matches where you are.
If you're new to all of this, start here. No technical background needed.
Free, structured courses on alignment and governance. The single best on-ramp; several of us went through it ourselves.
A careful, readable case for why this might be one of the most pressing problems — with honest treatment of the counterarguments.
A frontier lab's own account of when and why AI could become dangerous, and what they think can be done about it.
One sentence, signed by many of the field's leading researchers and lab heads. Useful as a marker of how mainstream the concern has become.
A living map of the AI safety ecosystem — organisations, courses, funders, communities. Good for finding your next step.
The core research problem: how do you make capable AI systems reliably do what their designers and users intend?
The paper that translated vague worries into concrete research problems — reward hacking, safe exploration, robustness to distribution shift. Still a useful vocabulary.
Why standard training methods could produce systems that pursue goals their developers didn't intend. Probably the best single technical overview.
Mechanistic interpretability — actually opening up neural networks to see what's computed inside. Start with "Toy Models of Superposition."
How current frontier models are actually aligned in practice — training models against written principles rather than only human labels.
Where the field argues with itself in public. Uneven, but the best place to watch ideas get stress-tested.
The bigger-picture arguments — for and against treating advanced AI as a potential catastrophe.
The most careful version of the x-risk argument, built as an explicit chain of premises with probabilities attached — so you can see exactly where you'd disagree.
A taxonomy of how things could go badly — malicious use, racing dynamics, organisational failure, rogue systems — beyond just the classic misalignment story.
A thoughtful case for lower risk estimates, from inside the conversation. We try to keep the strongest versions of both sides on the table.
A concrete, contested scenario forecast of the next few years. Read it less as prediction and more as a way to make your own expectations explicit.
Policy levers, international coordination, and the view from here.
The central academic hub for AI governance — compute governance, frontier-lab policy, international agreements.
A short consensus piece by senior researchers on what governments should do now. A good baseline for policy discussions.
The national AI programme — useful for understanding where Indian state capacity and attention currently sit.
Some of the most serious writing on AI policy from an Indian vantage point.
For longer-form treatment. Links go to overviews; copies circulate within the group.
The best narrative introduction — how alignment research actually emerged, told through the people doing it.
One of AI's founding textbook authors argues the field's standard objective is the problem, and proposes a redesign.
The book that put x-risk on the map. Dated in places, but many of the core concepts still frame the debate.
If the reading has you wanting to contribute, these are the doors people in our group have actually used or recommend.
A mentored research program that has become one of the main pipelines into full-time alignment work.
An intensive technical curriculum (interpretability, RL, evals) — also excellent for self-study; the materials are open.
Regular global research sprints and hackathons — a low-cost way to try doing safety research before committing to it.
The most complete listing of safety-relevant roles across research, engineering, policy and operations.