Introduction to ML Safety

Name: Introduction to ML Safety
Author: Dan Hendrycks

ML systems are rapidly increasing in size, are acquiring new capabilities, and are increasingly deployed in high-stakes settings. As with other powerful technologies, safety for ML should be a leading research priority. In this course we’ll discuss how researchers can shape the process that will lead to strong AI systems and steer that process in a safer direction. We’ll cover various technical topics to reduce existential risks (X-Risks) from strong AI, namely withstanding hazards (“Robustness”), identifying hazards (“Monitoring”), reducing inherent ML system hazards (“Control”), and reducing systemic hazards (“Systemic Safety”). At the end, we will zoom out and discuss additional abstract existential hazards and discuss how to increase safety without unintended side effects. For the course content and assignments, refer to the schedule.

Prerequisites

This is a topics course in machine learning, so a solid background in Machine Learning and Deep Learning is necessary. If you don’t have this background, we recommend Week 1-6 of MIT 6.036 followed by Lectures 1-13 of the University of Michigan’s EECS498 or Week 1-6 and 11-12 of NYU’s Deep Learning.

Syllabus

Safety Engineering: Risk Decomposition, A Systems View of Safety, Black Swans
Robustness: Adversaries, Long Tails
Monitoring: Anomalies, Interpretable Uncertainty, Transparency, Trojans, Emergent Behavior
Control: Honesty, Value Learning, Machine Ethics, Intrasystem Goals
Systemic Safety: ML for Improved Epistemics, ML for Improved Cyberdefense, Cooperative AI
Additional X-Risk Discussion: Future Scenarios, Selection Pressures, Avoiding Capabilities Externalities