Link Search Menu Expand Document

Introduction to ML Safety

ML systems are rapidly increasing in size, are acquiring new capabilities, and are increasingly deployed in high-stakes settings. As with other powerful technologies, safety for ML should be a leading research priority. In this course we’ll discuss how researchers can shape the process that will lead to strong AI systems and steer that process in a safer direction. We’ll cover various technical topics to reduce existential risks (X-Risks) from strong AI, namely withstanding hazards (“Robustness”), identifying hazards (“Monitoring”), reducing inherent ML system hazards (“Alignment”), and reducing systemic hazards (“Systemic Safety”). At the end, we will zoom out and discuss additional abstract existential hazards and discuss how to increase safety without unintended side effects. For the course content and assignments, refer to the schedule.


This is a topics course in machine learning, so a solid background in Machine Learning and Deep Learning is necessary. If you don’t have this background, we recommend Week 1-6 of MIT 6.036 followed by Lectures 1-13 of the University of Michigan’s EECS498 or Week 1-6 and 11-12 of NYU’s Deep Learning.


  1. Hazard Analysis: Risk Decomposition, A Systems View of Safety, Black Swans
  2. Robustness: Adversaries, Long Tails
  3. Monitoring: Anomalies, Interpretable Uncertainty, Transparency, Trojans, Emergent Behavior
  4. Alignment: Honesty, Value Learning, Machine Ethics, Intrasystem Goals
  5. Systemic Safety: ML for Improved Epistemics, ML for Improved Cyberdefense, Cooperative AI
  6. Additional X-Risk Discussion: Future Scenarios, Selection Pressures, Avoiding Capabilities Externalities