AI driver learns in simulated environments

New Waymo blog with modular AI and simulation of lidar:

highly realistic simulations (camera simulation in the middle, lidar simulation on the right). Notably, in this example, the sensor data is purely synthetic and is produced by our generative sensor-simulation models

The Waymo Foundation Model employs a Think Fast and Think Slow (also known as System 1 and System 2) architecture with two distinct model components:

  • Sensor Fusion Encoder for rapid reactions. This perceptual component of the foundation model fuses camera, lidar, and radar inputs over time, producing objects, semantics, and rich embeddings for downstream tasks. These inputs help our system make fast and safe driving decisions.
  • Driving VLM for complex semantic reasoning. This component of our foundation model uses rich camera data and is fine-tuned on Waymo’s driving data and tasks. Trained using Gemini, it leverages Gemini’s extensive world knowledge to better understand rare, novel, and complex semantic scenarios on the road. …

Both encoders feed into Waymo’s World Decoder, which uses these inputs to predict other road users behaviors, produce high-definition maps, generate trajectories for the vehicle, and signals for trajectory validation.

Waymo AI blog

1 Like