AI in Tesla's FSD/Robottaxi and Sutton's "The Bitter Lesson"

And yet, one needs to tell it the rules of the road … one wouldn;t want its rules based solely on observed behavior … so there has to be a way to interact from outside with what the model has built.

They didn’t say that. I think you’re misreading the original blog post. It’s pretty clear that when they say E2E isn’t necessary, they’re referring to a “full” E2E system like Tesla was then proposing. Not the use of E2E anywhere in the process:

The premise of a fully end-to-end approach is “no lines of code, everything should be done by a single gigantic neural network.” Such a system requires maintaining a huge model, with every single update carefully balanced - yet this approach goes against current trends in utilizing LLMs as components within real systems.

[SNIP]

In summary, we argue that an end-to-end approach is neither necessary nor sufficient for self-driving systems. There is no argument that data-driven methods including convolutional networks and transformers are crucial elements of self-driving systems, however, they must be carefully embedded within a well-engineered architecture.

It’s clear that when they say that an “end-to-end approach” is neither necessary nor sufficient, they’re talking about a fully end to end approach like Tesla’s - and not having E2E in particular modules within an engineered architecture.

In a full E2E system, there’s no place to program into the system. It goes from photons to outputs. It’s a black box how it gets from one end to the other. The software engineers have little to no idea of how the model is deriving its outputs from the inputs. How do you suggest they do this?

It can. That’s what Waymo does. But it’s not what Tesla does (AIUI). They don’t have separate E2E modules that are glued together (Block A visualizes, Block B takes the visualization output and turns it into driving). It’s just one block.

Not in a fully E2E model. Or at least, not if you’re keeping enough “expert architecture” out of the process for it to actually be E2E. You teach the model what the rules of chess are, because those are very simple to code. You teach it what castling is, for example. But you don’t have a place to go in and teach the model when to castle if you don’t like what it’s doing - because that’s a very complex situation, and the model has developed its own way of handling when to castle, and it may not (probably isn’t) generating the sort of board analysis that you would use in figuring out when to castle.

That’s the point of the original blog posts above, and the potential downside to a fully end-to-end model. If you use a fully E2E system, you don’t have the same level of control over what the system is doing. You gain the power of compute and scaling by letting the system do its own thing, rather than doing what experts want. But in so doing, you’re relinquishing control over how the E2E system solves the problem. The system is more powerful, but also has more freedom to solve the problem in a way that the humans might not like or approve of. By not imposing controls on the system through modular or other bias-increasing architecture, the system is free to construct an approach wholly unlike anything that a human might ever do. But the consequence of that is the system can (and likely will) create an approach that is not at all amenable to human tweaking.

Forgot to respond to this.

He didn’t say we should never have expert systems. He only said that expert systems will not be as powerful or efficient as unrestrained systems. It’s absolutely his position that if power and efficiency is what you prioritize, then you should eschew expert systems.

But nowhere does he ever foreclose the existence of applications where power and efficiency might not be the only priorities, or even the most important ones. If all you want is to build a chess program that wins as much as possible, the unconstrained system is what you want. But what if we have a different priority? What if, to use a ludicrous example, the company that was developing the chess program were to be fined $1 million every time their AI program while training were to castle if it was down a pawn? Then you certainly wouldn’t follow Sutton’s approach - you’d have an expert system imposed at some point in the process to make sure that whatever the AI came up with, it never involved castling when it was down a pawn.

Driving isn’t like chess or go, with a single win condition and a binary outcome (win or lose). You want your system to do at least two things: i) maximize the total number of destinations it can drive to; and ii) minimize the number of injuries and fatalities and crashes. To maximize number one, you definitely want the more powerful and efficient system that Sutton describes. But if #2 is important to you, you need to have something in place that forces the system to minimize what happens in a failure condition - not just minimize the number of failure conditions.

Sutton never addresses how to balance goals and priorities other than how powerful the AI is - because that’s not the topic of his essay. He’s talking about which approach yields the more powerful system, not addressing whether there are applications where it’s better to have a less powerful system so that you have more expert control over what’s happning.

2 Likes

As I pointed out earlier, end-to-end that actually isn’t from end to end isn’t end-to-end, it’s buzzword compliance. If you want to believe their nonsense, I can’t change your mind.

Looks like you didn’t read the article I linked.

That’s not what he said. Do you have a quote? I do:

They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity.

And, of course it’s silly to think that people want systems that aren’t powerful nor efficient.

He left no door open.

I’m sorry, but there clearly has to be a way to instruct the rules of the road … several orders of magnitude more complicated than the rules of chess.

1 Like

I did.

I think you’re misunderstanding that article. Here’s the key bit:

So, the main difference is not in the blocks themselves but in how they are trained and optimized. In an end-to-end system, the blocks are jointly optimized to achieve a single overarching goal. In a non-end-to-end system, each block is optimized individually, without consideration of the larger system’s objectives.

Wait: Isn’t it more of a Black Box now? How do we even validate this thing and put it on the road?!

Hey, I’m just the messenger here.

The article isn’t saying anything inconsistent with what I pointed out. It’s just pointing out that when Tesla converted from modular to E2E (which they did), they didn’t have to start from scratch. Rather, they merged their modules.

In a modular system, Block A and Block B are each optimizing to meet their goals. Block A - come up with the best representation of objects in the environment. Block B - come up with the best driving program given that representation of objects in the environment. It’s got the “chains” of an expert system, because you’re forcing the system to make sure that is accurately forming an “object” visualization of the environment.

Once you merge them and allow feedback from Block B to shape and alter what Block A is doing, that is completely changed. Block A is no longer tasked with coming up with the most accurate object categorization of the environment - it’s now tasked with coming up with whatever outputs yield the best result out of Block B. That fundamentally changes the system. Or, to quote the article, it makes it more of a black box. Which makes it both less constrained by human/expert thinking and less amenable to human/expert modifications. Which will make it a little harder for Tesla to solve the “what happens after the system fails” problem.

1 Like

Except that people do that all the time. Your car isn’t the most powerful system it can be, because you have other priorities: price, comfort, safety, etc.

We don’t want the most powerful or efficient driving system if it is less safe than a less powerful or efficient one.

1 Like

But we’re not talking about “the rules of the road.” We’re talking about, “what do you do in a situation where you don’t know how to proceed?” That’s a state within the rules of the road. It’s not analogous to “bishops move diagonally, always and never changing”; it’s analogous to “protect your queenside rook.” You can’t “program that in” if you’re dealing with a system that’s been free to develop without knowing what it means to “protect” a piece, or even what a queenside rook is, or doesn’t have any place in the algorithm for you to stick that in.

Well, by definition, you then don’t know what to do!

But, more seriously, it is possible that you are focused specifically on the “go to side of road” problem while others of us are talking more generally about how all driving works.

I disagree. The article is pointing out that “blocks” can be separately developed, but then jointly trained. Your quote about the “black box” shows a misunderstanding of the article, which directly addresses the visualization concern you brought up:

So, it’s a Black Box, but we can also, at any point in time, visualize the output of Occupancy, visualize the output of Object Detection, visualize the output of Planning, etc…

Where did Sutton say the AI design he’s saying is best is less safe?

First, it’s not “don’t know how to proceed,” it’s that the calculated confidence level(s) are not up to some programmed/required standard. And yes, you can construct a model’s parameters to encapsulate that.

Well, that’s because I think it’s the critical bottleneck to having autonomy.

If Tesla can’t fix the “go to the side of the road” problem, then it will not be the one to “solve” autonomy. I can’t imagine any regulator ever allowing a system to be the only driver if that system ever just switches off, leaving the car without a driver. It’s not a sufficient condition, but it’s a necessary one. Waymo’s addressed it by having a different type of driving architecture, one that takes a different approach to “all driving.” Their system is less expansive than Tesla’s, but it’s able to handle the bottleneck problem so far.

2 Likes

I know I’ve already responded to this nonsense, but in thinking about it more, I’ve come to the conclusion that you’re not discussing in good faith, but are simply trying to either make points, even if that means you’re defending the indefensible. Trying to say that we shouldn’t optimize development of AI systems is indefensible and I won’t be a part of any such discussions moving forward.

As such, welcome to my ignore list. I won’t respond to any more of your posts. I hope you take this time to consider what you’re really arguing and why.

1 Like

Yeah, I think he’s wrong. It might start off that way, but the whole point of merging the two blocks is so that Block A starts taking feedback from Block B, rather than operating modularly. Which means that Block A will start to change its approach in response to Block B’s demands. You can always try to “read” what’s going from Block A to Block B, but since they’re no longer modular the output from Block A is no longer constrained to be Occupancy or Object Detection - or to continue to be readable. You’ve eliminated the architecture that kept Block A locked in place that way.

Right where he said his AI design is most safe. He didn’t. Sutton might not care one whit about safety in any AI application ever (and having read some of his other stuff, esp. him being welcoming of AI “succeeding” humanity so that we shouldn’t try to keep it from taking over, I think he probably doesn’t). But most of the rest of us do.

Of course you can calculate when the confidence levels don’t hit a programmed/required standard - that’s how FSD knows to disengage when it “doesn’t know how to proceed.” Yes, that’s an anthropomorphized way of expressing it, but it makes communication easier to sometimes use that shorthand. What gets more difficult, if not impossible, is to develop what the car should then do after it has determined that it doesn’t know how to proceed - after hit has triggered whatever caused it to disengage. If the car isn’t “thinking” the way a human does, if it isn’t chained by the experts into a human-like way of solving these problems, then what the car is doing may not be at all usable for any ‘emergency’ routine after disengagement.

Why is it indefensible? You wouldn’t want to optimize development of an AI system if it would result in the destruction of humanity, to take an absurd example.

What I think you’re doing is implicitly assuming that “optimization” automatically includes considerations of safety. But that’s not true. In fact, it’s the main critique that Mobileye was making of the FSD system. That by moving from a modular architecture to a pure E2E system, that Tesla was prioritizing power and efficiency over the ability to control processes for safety. Sutton’s essay doesn’t even address the possibility that there might be some tension between the most powerful AI and a safer AI, but obviously such tension can exist.

The idea of “optimizing” AI ignores the possibility that there might be different things that we want from AI that might be in tension with each other. Which is…ridiculous. The entire field of studying AI alignment exists because we understand that making the most powerful AI may not be the same thing as making the best AI. That there isn’t an “optimizing” for a single aspect of AI (like power or efficiency), but that AI’s have numerous characteristics that we care about.

You don’t have to respond to my arguments, but if you’re going to storm off in a huff you should do so based on what I actually am arguing, rather than on a strawman of your own invention.

1 Like

As noted, I think you are over-focused on this issue. Part of this is dramatizing the condition. There are a number of ways a system can get to an “I don’t know what to do next”. It doesn’t mean total confusion, but just that it doesn’t know what to do.For Waymo that is happening infrequently enough that pull to the side is a reasonable response. For Tesla pre Austin, it was not, so the human fallback was used. But, getting Tesla to pull to the side is a question of reducing the frequency of confusion, not learning a new behavior.

Is it? That seems unlikely. The key point is that for a robotaxi to run without a driver-employee in the car, it cannot ever try to return control to a driver. It has to have the behavior of taking the steps necessary to end the trip by coming to a safe stop, rather than disengaging for a driver to take over.

I think it’s unrelated to frequency. I don’t foresee Tesla (or anyone else) getting to the point where disengagements are so infrequent that it would be okay for the car to just go on without anyone driving it. If Tesla had the same frequency as Waymo in terms of when it can’t finish the trip, it still couldn’t operate without a human driver-employee in the car. No matter how frequent or infrequent the disengagement scenarios are, the car has to have something that it can do without a human in those circumstances to avoid crashing.

Maybe this is something they can do quickly. We’ll see. They’ve been running their closed beta in Austin for a bit more than three months now (one month without allowing passengers, two months with limited passengers), and have started in CA. Still haven’t pulled the employees in Austin, and they still haven’t proceeded with CA permitting to have autonomous vehicles with safety drivers (they’re still just running an ordinary “uber” system).

Seems to me that if the current software has a specification “if can’t resolve what to do, do this”, where “this” is “turn things over to the human”, that one could redefine “this” to be “pull to side and stop”. Hardly the cataclysmic change you portray.

1 Like

But can you do that with an E2E system like FSD? No one has specified to the software “if you can’t resolve what to do, do this” - it disengages because not because it has reached a software instruction telling it to disengage, but because it can’t proceed with generating control instructions to the car. It’s not necessarily a scenario where the car knew how to “pull to the side and stop.” Heck, we don’t even know if the algorithm that the software has developed even knows what “the side” is - in an E2E system, the algorithm is free to develop in ways that are completely different from how human processing works.

That’s the criticism of E2E in the articles Smorg cited above. It’s more powerful, because you’re not chaining the AI to think about things the same way a human would. But it’s less controllable, because the AI isn’t locked into any human-directed pathways - so its solutions might be completely unamenable to human guidance.

1 Like