The lead architect wasn’t very interested in simulation for accelerating learning: all their training data came from the cars.
(take all mention of timelines and claims of capability with a grain of salt, maybe add some months/years and divide capability by 2 or 3)
“Tesla has been able to do real-world video generation with accurate physics for about a year.
It wasn’t super interesting, because all the training data came from the cars, so it just looks like video from a Tesla, albeit with a dynamically generated (not remembered) world.”
Another 2024 moment of clarity:
”Two sources of data scale infinitely: synthetic data, which has an “is it true?” problem and real-world video, which does not.”
Now in 2025 a different view:
This is a positive for Tesla AI driving, it shows signs of learning a better way.
Tesla has been learning since its founding. The difference with many or most legacy automakers is that Tesla recognizes where change is necessary and they change.
They have probably trained on thousands and thousands of hours of road experience and tapes of various situations. Simulation is probably most useful for fine tuning software responses to various situations.
Most driving is eventless which means most of the data is repetitious. By contrast data about accidents is sparse and that is where simulation is most useful, the so called, “edge cases.”
New Waymo blog with modular AI and simulation of lidar:
highly realistic simulations (camera simulation in the middle, lidar simulation on the right). Notably, in this example, the sensor data is purely synthetic and is produced by our generative sensor-simulation models
The Waymo Foundation Model employs a Think Fast and Think Slow (also known as System 1 and System 2) architecture with two distinct model components:
Sensor Fusion Encoder for rapid reactions. This perceptual component of the foundation model fuses camera, lidar, and radar inputs over time, producing objects, semantics, and rich embeddings for downstream tasks. These inputs help our system make fast and safe driving decisions.
Driving VLM for complex semantic reasoning. This component of our foundation model uses rich camera data and is fine-tuned on Waymo’s driving data and tasks. Trained using Gemini, it leverages Gemini’s extensive world knowledge to better understand rare, novel, and complex semantic scenarios on the road. …
Both encoders feed into Waymo’s World Decoder, which uses these inputs to predict other road users behaviors, produce high-definition maps, generate trajectories for the vehicle, and signals for trajectory validation.
Calls into question value of Tesla’s consumer FSD data and suggests importance of simulation.
What happened in late 2022 is Tesla FINALLY adopted simulation at some scale after 7-8 years of pretending they knew better. How important is this? Waymo converged to safe and insurable in Phoenix within 10M miles. Tesla is almost there in Austin at 7B miles. That’s is a 70X chasm in approach on the path to progress. From the start, Waymo freely admitted they did 1000x as many simulation miles as real ones. Welcome to the game. The same is true of route planning and maps. Tesla steals as much of Google Maps as they can get away with and always has used the convoluted Open Maps while encouraging the horde to drop pins on Google Maps whenever the routing is crap. Fixing routing problems by dropping a pin onto an app you are unwilling to admit is superior and just license it seems like instability.
A recent statement from Waymo on real-world miles and use of simulation:
The Waymo Driver has traveled nearly 200 million fully autonomous miles, becoming a vital part of the urban fabric in major U.S. cities and improving road safety. What riders and local communities don’t see is our Driver navigating billions of miles in virtual worlds, mastering complex scenarios long before it encounters them on public roads. Today, we are excited to introduce the Waymo World Model, a frontier generative model that sets a new bar for large-scale, hyper-realistic autonomous driving simulation.
Read an article the other day that said Waymo has trained on 15 billion simulated miles. Presumably, they can change up the simulations to match what they need to train on.
Meanwhile, TSLA is stuck with 7 billion real-world miles which are likely a lot of the same stuff (e.g., driving straight in a single lane).
This likely goes a long way to explain why TSLA is so far behind Waymo and can’t seem to get out of their own way.
The shot on the left is from a Tesla vehicle. The still frame on the right (part of a sequence in the YT video I linked) is Tesla using AI to modify the original video to simulate a car cutting across lanes.
Back up and watch the video in its entirety from the beginning and you’ll gain a new respect for what Tesla is actually doing and the progress it’s making. Best half hour you’ll spend on FSD.
Oh, and Ashok keeps coming back to how training Optimus leverages the technology that’s been developed for training FSD.
Here’s a summary of history (there are others upthread).
Way back in the beginning Waymo abandoned the myth of lots of real miles. They started building TPUs for inference way back in 2015. The newbies convinced themselves this is a real mileage myth. What is real? Waymo CONVERGED to safe and insurable (and TRULY AUTONOMOUS) in Phoenix at under 10M miles. They’ll be live in 20 cities before 250M miles. Now tell me again how much 15B means? Here’s my hint for where to look…15B is 60x of 250M. The reality is obvious. Tesla’s main jester is only talking about making his own chips and inventing inference. This is beyond ignorant as Alphabet TPUs are on version 7. Hooray for Tesla, they finally got the message about how synthetic is the path to the long tail. Of course the jester can’t help himself and remains enamored with the new word inference he seems to have stuck in his head next to hyper-exponential and order of magnitude.