Waymo AI¶
This is my personal write-up of Waymo's 2025 Blog post on "Demonstrably Safe AI For Autonomous Driving"
Summary¶
Waymo frames autonomous driving as a giant reinforcement learning problem and uses learned models for each part of the problem:
- Control Policy / Agent = "Driver"
- Environment = "Simulator"
- Reward = "Critic"
Importantly, each of these component models uses a common large foundation model backbone. The models that actually run onboard (Driver) and at-scale (Simulator and Critic) are distilled from the original large "Teacher" models to small, performant "Student" models.
They mention that the onboard Driver model goes through a separate safety validator which is presumably not learned.
Importantly, the Waymo Driver employs a separate and rigorous onboard validation layer, which then verifies the trajectories produced by the Driver’s generative ML model.