Waymo self-driving cars -- progress

Well, I can say with 100% certainty that he is wrong. The vast majority of Teslas on the road today will never be level 5 without an expensive retrofit. Musk is on record himself stating such.

3 Likes

The most likely requirements for a robotaxi would be:

  • Front bumper camera (Cybercab has one, as do new models)
  • Additional internal camera (much of the rear cabin area and front footwells not visible, which will cause operational problems)
  • Upgraded HW (certainly for HW3 vehicles, possible for HW4 vehicles)

They might also need an external microphone added to hear sirens and spoken commands more accurately. Self-closing doors might not be essential for early adoption (Waymos don’t have them), but perhaps at scale.

We can also add

  • working true FSD software
6 Likes

Sandy Monroe once said FLIR would be more useful than LIDAR.

But the suggestion did NOT gain any kind of traction.

I don’t agree with Elon statement that using more than vision just confuses the software to the point it can’t “choose an action”.

When driving, we humans use vision, hearing ( horns sirens), smell (smoke, chemical), touch (pot hole, bumper, etc).
Yet we are able to drive.

I understand that adding sensors drives up the cost.

:worm: :leaf_fluttering_in_wind:
ralph

4 Likes

True! I was just talking about the hardware requirements, though. Tesla’s public descriptions of its plans involve “flipping a switch” (metaphorically through a software update) to allow most existing Tesla’s to be able to serve as robotaxis. I was elaborating on the upthread poster’s suggestion that hardware upgrades would also be necessary.

Here is what ChatGPT says;

Category Estimated Share Details
Driver’s Net Earnings ~50–60% After Uber’s service fee and expenses, drivers typically keep about half of the fare .
Uber’s Service Fee ~25–30% Varies by city and trip; covers app development, support, and profit .
Insurance (Commercial Auto) ~5–10% Uber provides commercial insurance, funded from the fare .
Taxes, Tolls, Airport Fees ~5–10% These are passed through to local governments or facilities .

But remember, the 50-60% for the drivers gets replaced by the Waymo employees that maintain their network, upgrade their software, inspect and clean cars, provide remote operators when assistance is needed and do maintenance on their cars.
Oh wait, and this includes fuel! Let’s just skip the vehicle depreciation.
But who pays for a parking location when cars are not needed or are being charged?

Mike

7 Likes

What he says makes sense up to a point. But where you have two competing inputs, one will be better. The software needs to learn how to decide which one to choose.

Elon says that humans have eyes and a brain and that’s all we need to drive. Yes, but that doesn’t mean we couldn’t also use more information. For example, we don’t just use our eyes to estimate speed. We also have speedometers. Most cars now have a backup alarm which gives us audible information. And of course, many crashes happen in low visibility situations, so something different than eyes might be useful some of the time.

8 Likes

I think that will depend a lot upon how the business is structured. And what stage of the business we’re talking about.

Like Waymo’s initial forays, in the initial rollouts the vehicles will all be owned by Tesla. They’ll be geofenced in specific part of the metro area, and they’ll need to be frequently attended to by live employees as various things happen. They’ll get stuck, someone will leave a door open, etc. That points to having a property located somewhere near the service area where Tesla is cleaning and charging the vehicles, handling lost possessions and complaints, and dispatching service teams from (teleoperators might be located somewhere else).

In later stages, though, it will depend a lot on the technology and the business model. Tesla originally conceived of an Uber-like model - people would be putting their own individually-owned cars into the Network part-time, earning money when their car wasn’t used. They’d have to pay Tesla to do all the stuff that would go into taking care of a taxi car (intra-day cleaning, dealing with accidents or other issues, returning lost possessions). But in that model, cars are owned by ordinary people and they “sleep” and charge in their driveways.

They’ve also described more of a commercial venture approach to Tesla’s taxi network. Here, some local entrepreneur might own a handful (or more) vehicles and enter them all into the Tesla network. They might themselves have a small parking lot, maybe even employees, where the vehicles would return to for charging and cleaning and handling lost objects. That would be the ideal for Tesla (or Waymo if it came to that), since it’s super-scalable and asset light. There, the cars sleep and charge at the owner’s business. Rental car companies might be a good choice for this, given some of their synergies - but anyplace that’s got a big parking lot that’s unused outside of business hours might be a good choice.

Or Tesla just owns/leases land and runs the fleet themselves, and has the sleep-charge-clean spots in various places around town.

1 Like

One of the issues here is whether the additional inputs are related to vision or not. E.g., a microphone provides and entirely different kind of information than a camera and thus there is no possibility of confusion. The camera may not yet be able to see the fire truck, but that doesn’t equate with the camera system believing that the fire truck doesn’t exist. Whereas both cameras and LIDAR are creating an “image” of what is out there, one lacking in color, but with stronger perception of distance in some conditions.

There is a tendency to point out a “weakness” of cameras in some situations like heavy fog or snow and suggest that some other tool might compensate for this … but no alternative perceives color … so perhaps the right perception is that these conditions are not safe for driving.

4 Likes

Actually, the software (could) already know how to decide. In computer vision object detection, such as shown below, every one of the ~decade old algorithms can draw a box around each object and give it a score from 0-100% (shown as 0 to 1.0) on how confident the result is.
I doubt that the scores are comparable from one tech to the other (i.e. optical vision vs radar or lidar), but certainly some approximate conversion could be made such that the highest score wins.

In the above example, when it is 91% sure the left object is a bus, it gives the remaining 9% among a few other choices if you look at the entire output. Maybe 3% that is a mini-van, 2% a train, etc. All those things need to be avoided in any case.
But what happens when vision says 75% a plastic bag blowing in the wind and the other 25% a rock or some other item to be avoided…and the lidar says 60% it is a rock and 40% it is nothing, just a lump in the pavement.
What happens when the object is actually a rock in a plastic bag?
(Imagine the example I found below with a much smaller bag but a rock big enough to damage a car)

Mike

3 Likes

We use about 4 senses to drive. People who have race track experience (such as myself) are even more aware of that than most. Vision, yes. But you are also hearing for other cars, sirens, brake and tire noise, engine noise, etc. You are feeling the car, through the steering wheel, through the seat of your pants, through the brake pedal. You are smelling for things like smoke, etc. The only sense I’ve never used is taste. :smiley:

I strongly suspect vision-only is going to be a fail for Tesla. Even human drivers don’t use only vision. Frankly, he is wrong that all we use is our eyes.

10 Likes

It is legal for deaf people to drive in all 50 US states.

In most jurisdictions, legally blind individuals are prohibited from holding a driver’s license and operating a motor vehicle on public roads.

1 Like

The one time I was in a moving accident that was my fault I rear ended someone at a red light…braking too late and I hit them at about 5 mph. I bit my lip and tasted blood.
So I count 5 senses for me!

Mike

3 Likes

Sure. And if a deaf person fails to respond to an auditory stimulus, like a siren from an emergency vehicle they can’t see yet, then people will not get upset about it after the fact. Because the person was deaf, and so it’s not their fault.

But if a commercial for-profit enterprise running a taxi service has a vehicle that fails to respond to an auditory stimulus, like a siren from an emergency vehicle it can’t see yet, then people very well may get upset about it. Even perhaps hold them liable. Because unlike the deaf person, the for-profit enterprise has made a choice about whether to equip their vehicle with an external microphone. So they run the risk of taking a legal or reputational hit if their vehicles can’t respond to sounds.

7 Likes

The expertise here in multiple areas: Cars, AI, sensors, legal and regulations is quite impressive. You guys have so many reasons why it won’t work.

1 Like

Just to clarify this some.

A method that has:

  • Model V that takes image data and makes prediction V (75% plastic bag, 25% rock)

and also has

  • Model L that takes lidar data and makes prediction L (60% rock, etc)

and then a third model,

  • Model V+L, that takes the above predictions and makes a new prediction,

is a method that in general ignores a lot of the joint data from the two different sensors.

I would be very surprised (but I don’t know) if any advanced methods used the above approach. Almost certainly no.

Instead, in principle, all sensor data can be combined and taken as input into Model S, which then makes prediction S based on all of the detailed joint sensor data (and Model S is trained accordingly).

If the different sensors are really adding incremental, new information (including nuanced nonlinear interactions), then the full sensor data into Model S should outperform any of the individual sensor data and associated individual models. Model S should also, in general, outperform Model V+L (which is tossing out a lot of joint data, as mentioned).

If someone said this:

This indicates to me that this person really doesn’t understand how these methods work, including some foundational mathematical and statistical principles.

Now, it could be that in practice this is what they have found.

But if an additional sensor (of any type) adds predictive value to an existing sensor array (of any quantity, type, configuration), then there should be a model of that fuller sensor data that can be trained with measurable outcomes.

Since we all understand conceptually the value of multiple sensors (2 eyes vs 1 eye, sound and vision vs sound alone, etc), we would also expect multiple sensors to add value for robotics.

That was the easy part. The difficult part is to actually implement all of that complexity.

2 Likes

One would think so. But Elon said in an interview that they had trouble reconciling conflicting data from the cameras and radar, so they stopped using radar.

2 Likes

Another iteration of the “ignorance is bliss” narrative?

Steve

1 Like

In Indiana individuals with sight limitations can get a drivers license. I would imagine there are minimums and testing is required.

Interesting, because Waymo is multi-sensor, so they figured something out (and have not abandoned it). And i know this is a big point of difference of waymo vs tesla, that Tesla is camera only, as I understand it.

And a quick web search shows the multi-sensor approach has plenty of attention.

As an aside, for people that are not technically trained and experienced in the details (Musk, Altmann), I am realizing, the more I hear from them and learn about their backgrounds, the less serious I take them on technical matters (they are both basically fundraisers). Even if they listen to their engineers, it’s still not clear to me how much they really understand to be making any kind of technical predictions.

Just time and space synching the data, I am sure, is a non-trivial data problem - and that is before any machine learning. (Although one has to synch multiple sources of camera data.) To hear Musk acknowledge decisioning from the multi-sensor data is “too hard” (my words), is interesting given that Waymo persists with multi-sensor.

Upside: keeping lots of people employed trying to figure this out.

Here’s another take, which I’m sure is recognized by robotics teams.

What sensory approach did evolution (mother nature) take?

I see a lot of multi-sensor species out there.

Disclaimer: not expert in robots

5 Likes