Tesla's AI Day 2022

This is a solved problem, long ago. Have you not seen the Segway or the wheelchair that climbs stairs?
Robots that “balance,” even toy ones, are at least 20 years old helped by the IMU (inertial motion unit) chips that are now mass produced ($2 - $5) for phones. Making the robot balance once you have this is just a matter of some control software. The same chip allows you to push on a robot (with a reasonable force) and for it to not fall over.

Mike

When you talk about “skills” I think you are referring mostly to things that are task-specific. The smarts to navigate are pretty much the same regardless of the robot’s form: recognize occupied space, move towards the destination with appropriate speed and without causing problems.

But sure, there are some different skills involved in not crashing when you have four (or 12) wheels and are traveling on a road at 70mph, vs. maneuvering around kid toys on the stairs. There are indeed a pile of interesting problems to solve on the way to getting a humanoid robot to use its body in the ways that a human can. But we already have existence proofs for most of them being solvable. Not going to be quick or easy, but it’s all doable.

1 Like

I’ve never seen a Segway go up stairs. :grinning:

I know that this has been “solved” for some form factors - but not necessarily for a bipedal humanoid (let alone at low cost), and even then only for very controlled environments. Again, I’m a lay person - but it seems to me that it’s a completely different engineering problem to make a Segway balance than to design a physical humanoid body that can go up a wide range of stairways. There’s going to be some overlap, to be sure - but the motion of going up a staircase (such as balancing on a single foot) is just fundamentally different than balancing on a flat surface (with two wheels on the surface at all times).

So I guess the question is what Tesla’s “car mind” brings to the table. Yes, there’s lots of existence proofs of some of these things being solvable. You can find video of robots, for example, whose AI and spatial awareness is sufficient to let them assemble a piece of Ikea-style furniture they haven’t “seen” before. But that’s not something that Tesla’s “car mind” has ever really had to do before - right?

Tesla’s done a lot of work solving the “move around in physical space” problem - how a robot would navigate from one place in a factory to another place in the factory relying solely on visual cues is absolutely something that Tesla’s got a ton of miles under the belt, as it were. But is that really the hard issue for building a useful humanoid robot? It’s certainly a necessary issue. But if I have a humanoid robot and I ask, “Please go upstairs to the kitchen and bring me back a glass of water with ice,” almost none of the hard parts of that request seem to me (as a lay person) to involve picking a route to the kitchen and avoiding obstacles. My Roomba can do that. Rather, the hard parts seem to be the humanoid form going upstairs, unpacking the verbal commands into steps (1: go to kitchen; 2: get glass from cabinet; 3: put ice in glass; 4: put water in glass; 5: return), and actually performing the physical tasks in my kitchen without breaking the glass or spilling the water or ice.

Again, it doesn’t seem like there’s much overlap between the skills that Tesla has “taught” its car and the skills that a useful robot would need. Absolutely the “navigate a safe path from point A to point B” is one area of overlap - but is solving that problem really what’s been holding other companies back from making a general purpose robot?

I think this is about 6 years ago. But I’ve been mentoring HS robotics for 18 years and I know I saw single foot balancing 10-15 years ago as well. The demo allowed you to try and push it over, which no one could do. Maybe hitting it with a baseball bat would have worked.

100% agreed. But then when electronic computers were first built all they did was compute ballistic tables for firing weapons and somehow doing that math was easily extrapolated to all sorts of other things that the general public didn’t understand as well.

I think all those things have been demonstrated individually for many years. The trick is doing them all within one device. Parsing the English language used to be really hard…now it isn’t. Not perfect by any means, but still usable in Siri, Alexa, even in cars like Teslas.

Building delicate robotic arms has been done. Doing it so that it is inexpensive and durable is a challenge.

Mike

1 Like

Off topic, but… when I get an automated robo phone call, where the machine asks me how I’m doing, I respond “Oh, I’m just peachy keen, thank you!”. That generally results in the robot hanging up. As I hoped it would. Thankfully those (truthfully correct but socially questionable descriptive words deleted) are still stupid.

Most of the voice->NLP processing systems are intentionally trained on only a few hundred context specific responses on purpose. But even the more general purpose ones aren’t too difficult to break or confuse.

IMO, the advance that is needed for a personal robot would be for the software to remember the context so you don’t have to specify every little detail. If you say get me a glass of ice water it would know to go to the kitchen, get a glass from the right cabinet and then get the ice from the fridge, then the water then find where you are to deliver it. And it would be able to figure out how to get the ice without seeing that fridge before. Or have you show it once and it remembers.

This is all just more “software” and doesn’t depend on the physical robot capability, but I still think it will take longer than just building a humanoid robot that can grab some nuts and put them on a car in a factory. (And I still think that building a purpose built robot would do this task better anyway).

Mike

Well, that’s why Tesla’s Optimus project is different from earlier attempts at robots. Tesla has been putting its efforts mostly into creating infrastructure for (vehicle) robots to learn rather than teaching them specific skills. Using Dojo, simulation, and software updates, progress is relatively rapid.

This is how Tesla, unlike any other company ever, can do public presentations, by engineers, of its current development work without (much) worry. When your pace of innovation far exceeds that of others, anything that might get stolen is pretty much irrelevant. By the time such things can be exploited, they’re way out of date.

So, while what’s being shown now blows the minds of engineers who understand what they’re seeing, by next year it should do the same for educated laypersons. The sort of progress we’ve seen in ten years of development elsewhere, we’ll see in one year from Tesla. The year after that, who knows?

2 Likes

That’s one of the things that I do not understand enough about AI - is that type of transferability still a possibility? It’s easy to see how a machine that could calculate ballistics could be used to calculate actuarial formulae (or other things). A calculator is a powerful, generalizable tool.

But do AI’s work that way? DALL-E is an incredibly powerful AI - but could it switch to playing chess? Or writing poetry? I can see how the lessons learned in building an AI like DALL-E or Tesla’s “car mind” (does it have a name? Other than FSD?) could be transferred to a new, GAI project - but does the product itself give you a boost?

Yes - but isn’t that the thing? Getting the last X% of the problem solved - not necessarily to get to perfection, but to the point where you’re really pretty confident that it will work right nearly all of the time. There’s hundreds (thousands?) of companies from Google/Apple/Amazon to iRobot to tiny niche firms that are all trying to perfect and solve all of the manifold engineering problems that are part of this space, all of whom have made progress on a lot of fronts but all of whom are still encountering issues that they’re still trying to work out. It seems (again, to a lay person) that there’s still A LOT that needs to be solved for something like a general-purpose humanoid robot to be ready for mass-market manufacturing - and that an auto/energy company like Tesla wouldn’t necessarily have much of a headstart on that just because they were working on FSD.

Have they? In all the popular media discussion of Tesla’s FSD efforts, they’ve pointed instead to the “billions of miles of data” that they’ve collected from drivers as really the competitive edge. I’ve never seen any discussion of Tesla doing anything radically different with their machine learning.

Do you have something you can point me towards that describes what they’re doing differently from all the other companies working in the AI space?

I get the impression that you haven’t actually watched the AI Day presentation. They lay it all out.

Oh, I did (well, I watched the supercut version). But as you noted in your very first post, it’s not really that digestible for a lay person, being pitched more to engineers. They don’t really go into detail how their approach to robotics (or AI generally) is different from all the many other entities that are researching or developing in that space. Probably they assume their real audience (engineers, not lay people) are well aware of what the state of play in the field is. But I’m a lay person, so I’m not.

So is there something out there that kind of describes what Tesla’s doing differently than all the other AI/robotics shops are doing?

But, I think, you only think that way because for your whole life this has been happening. If you go and study how ENIAC worked all the formulas were basically hard wired. There was no terminal where you typed in a program in ~english-like text and a compiler to convert it to the machine’s language.

No it can’t. But the tools and process used to create and train DALL-E are pretty much the same as the tools used to do most other AI tasks. There are multiple different tools sets that are similar, but better at different things, such as the “frameworks” like Tensorflow, PyTorch, MXNet just like there are different programming languages that one might use to solve other problems – like Fortran, C/C++, Basic, Pascal, Python (and a hundred others).

I think that was and is the point of Tesla’s AI Day. To recruit the best of the best to solve all these last few percent problems and build it. Not easy for sure, but probably not rocket science either. I doubt they will meet the stated time line. And I think the reality will be that the first products are much more restricted in what they are able to do. Maybe work on one stage of an assembly line in a car factory.

I recall working on big SW projects with a few dozen people that took overnight to compile and build, never mind the slow regression testing and manual testing after that. Progress was very slow due to the long turn around to test the most simple things. Lots of time was spent just trying to optimize the infrastructure. Many current ML models take many hours, days or even weeks to train. Iterating to fix things takes a LONG time. Improving this by 10x, for example, can get you more than a 10x productivity improvement just because the programmers can try more things.
Did you see the Tesla Dojo project?
What genius college grad wouldn’t want this?

Mike

2 Likes

Yes, clearly. It reminded me very much of half the law firm pitch meetings I went to, when the managing partner would spend a lot of time talking about home awesome their pro bono environmental program was in an effort to lure recruits in - even though 98% of their firm was in securities and corporate restructuring.

I’m not in the field, so I have no way of assessing whether that “we’re not really a car company” pitch is something they can succeed at. Tesla is clearly presenting itself as an AI or Robotics company…but at the end of the day, I imagine whether it is or not depends on the consistent long-term strategy of the company’s leadership. If you’re a hotshot recruit, do you want to end up at Tesla - where the robotics team is a small part of a large company that has a hundred thousand employees who are almost entirely making and selling cars - or at a shop like OpenAI or Boston Dynamics, which are smaller enterprises but are entirely devoted to your field?

It’s the conglomerate problem. Not everything can be the main priority of the company. Tesla’s making batteries, solar cells/roofs, cars (obviously), AI, and now robots. I would think that a genius recruit in solar technology research probably wouldn’t put Tesla at the top of their list (though maybe that’s wrong?), given that it’s not really a priority of the company these days. Tesla’s core business today is making cars - it’s 90% of what they do. I would think their core future business priorities would be better batteries and teaching the cars to drive themselves, at least for the next decade or so. Is making real advances in something specific to robotics like, IDK, robot tactile feedback systems going to be a priority for a company like that, compared to other things?

I suspect that’s why Google Lab spins off their companies, so that someone who works for EveryDay Robots (like people who work for Boston Dynamics) know they’re working for a robotic shop, rather than the robotics side project of a Search Engine company.

I did - but I confess, I don’t have the technological background to understand it. From what I can gather, they’re in the process of trying to build an extremely powerful bespoke supercomputer. One that is capable of analyzing, if I’m using the technical term correctly, many many buttloads of video data for use in training an AI. So while some other computer might only be able to analyze several peta-buttloads of visual data, the Dojo will (when done) be able to analyze several exa-buttloads of visual data. As the man with the rather impenetrable accent pointed out, 3.7x more buttloads.

Not being a computer boffin, I don’t know if that has appeal to a genius college grad. From the AIDay presentation, I couldn’t tell whether Dojo is somehow better than other supercomputers around the world, or whether it’s able to do this because it’s designed specifically to solve this one type of problem and thus can do it faster than other supercomputers (which get used for disparate problems like climate models and economic forecasts and figuring out how the heck turbulence works or whatever). Certainly it’s going to almost entirely be used to solve this one specific type of problem. I don’t know whether that appeals to genius college grads generally, but it should appeal greatly to the ones who want to work on developing software for autonomous cars.

They aren’t just building a supercomputer. They are building the chips that they use to make it. This will be either genius or a total fail as they may spin it off like Amazon did for AWS. Remember how dumb of an idea that was. Not.

The cars are so profitable that there seems to be leeway to go into new/related businesses. How interesting would Amazon be just being the best at selling books? Imagine Google just doing search.

Mike

3 Likes

Well then, let’s try this. Imagine that you have to write a complex brief for a client. It requires at least half a dozen meetings for you to agree on exactly what’s needed, and then another half a dozen back and forth reviews of the document to get it into final form. You need to work through imperfect communication, imperfect understanding, and imperfect environmental issues. Not to mention applying your actual expertise of figuring out the relevant law and how it works in this case.

This is quite similar to what an engineer must go through to produce a software solution to a problem. Figuring out specifications, what the customer really wants, a good way to accomplish that, and then getting agreement that it’s what’s desired. Lots of back and forth. Lots of work in between.

What Dojo does is the engineering equivalent of making most of those client meetings go ten times as fast. So you finish your draft of the brief and instead of having to wait a day to get it back all marked up, you get it back in five minutes. Everything is still fresh in your mind and you can proceed with the next round of issues immediately.

I started writing software in the days when you punched up your program on cards, submitted your card deck to an operator, and got back a pile of paper to stare it. Eventually. At best you could do that a couple of times a day. And each time you did that it was expensive! Things have gotten better slowly over the years.

What Tesla showed machine learning engineers is that they have taken another leap in making turn-around time quicker. This means spending more of your time being productive, which means doing a better job faster.

Is that more digestible?

4 Likes

For a moment forget everything you know about computing. Machine learning is nothing like what has come before. There are no human created algorithms that solve problems. You need to envision a completely different paradigm - pattern matching.

Let’s start with babies, how do they learn? Some of their ‘knowledge’ is built into their DNA which deals mostly with the physical, the mechanics of how their body works. The rest of their ‘knowledge’ is acquired through their senses. How did Pavlov train the dogs? Repetition. Do this, get reward. Do something else get no reward. Rinse, repeat. Rinse, repeat. Rinse, repeat. Data! The dog’s neural network stored these ‘patterns’ (don’t ask me how, I don’t have a clue). The next time Pavlov rang the bell the dog matched the sound or the event against its stored patterns and somehow (don’t ask me how, I don’t have a clue), the weighings of the stored patterns directed the dog to salivate. That is about all AI machine learning is.

Why rote learning? Training your neural nets with lots of repetitive data.

Why do pilots have to accumulate flying hours? Training their neural nets with lots of repetitive data.

Why Malcolm Gladwell’s Big Idea: 10,000-Hour Rule? Training their neural nets with lots of repetitive data.

That is about all AI machine learning is. What does Tesla’s FSD have to do with the Optimus robot? At Tesla they keep talking about ‘full stack FSD.’ What is a stack? Learning about a set of tasks like taking left turns, recognizing pedestrians, etc. On a greater scale city driving, highway driving, parking lots, etc. Full stack is just joining all these stacks into one big stack (don’t ask me how, I don’t have a clue).

The neural network algorithms don’t care what the subject matter is about, they just store weighted patterns and match incoming data against the stored patterns. Create a bunch of ‘stacks’ separately (in school, chemistry, geometry, algebra) and join all the stacks → General AI! That is about all AI machine learning is.

Why Dojo? For the same reason that humans have huge brains. You need vast amounts of memory, not on tape or disk but in chips, and vast amounts of parallel computing power to match incoming data against the vast amount of stored patterns. To reach their goals Tesla could not rely on off the shelf hardware so they built their own.

The Captain

Strange as it may seem the above ties in with my very first professional program. The algorithms I created were too big for the limited computers of the time (IBM 650 - 1960). I told my boss the program was too big and didn’t fit in the computer. His reply, “It fits.” Try as hard as I could my rational, boolean brain could not find a solution. Back to my boss, same reply, “It fits.”

One morning at around 4 AM I woke up with the solution, “It fits!” The punched cards I had to process had an unorthodox coding system which I tried to convert to standard code and that required a lengthy algorithm that exceeded the computer’s memory. The solution was to create a table with the valid unorthodox codes and the equivalent standard code. Just a small table and the software’s ‘table lookup’ did the rest. Pattern matching!

The AI related part of the story is that the subconscious brain gave me the solution, not the rational, boolean brain – pattern matching? And the solution was certainly pattern matching, match the input data against the valid data in the table, a go-no go solution. No need to know what the data actually was, just if this, do that!

4 Likes

Yes, thank you. I think I have a better understanding of why Dojo might be exciting to someone trying to solve the problem of autonomous driving. Still not sure I understand why it would really be relevant to a young engineer interested in robotics, though, or why Tesla has decided to build a robot. It doesn’t seem like it would be relevant to anyone working on the physical body of a robot. And while it might be useful for developing a ‘robot brain,’ it doesn’t seem like it’s actually going to be used for that - or anything other than working out FSD - any time soon. I would think the FSD team will have top priority (and near-exclusive access) to the Dojo resource for quite a while going forward.

And is it going to be particularly useful or necessary for building a robot brain? I understand why having that type of a supercomputer would be useful and necessary for Tesla’s specific approach to AV. They’ve got available to them a gazillion video files of driving, and they’re trying to solve Level 5 with almost entirely visual data (ie. no Lidar and few other non-camera inputs) - so I can see why having a machine that can process a gazillion exa-widgets of video data is useful to them. Is that the bottleneck for other AI research? To use the legal brief analogy, a very fast reviewer is enormously helpful for meaningfully speeding up the review time of 400-page briefs - but far less useful for a 12-page motion. Is having access to massive amounts of video file review a real bottleneck for GAI research?

Yes, I think I understand - thanks for your summary and IGU’s.

As noted in my prior response, I can see why Dojo is necessary for how Tesla is trying to solve the specific problem of Level 5 autonomy - they have literally billions of vehicle-hours of video to serve as ‘repetitive data’ that they need to expose their neural networks to. Not sure I understand why something like that would be especially useful for robots or GAI mind development, where you don’t have (or necessarily need) billions of hours of video that needs to be processed. Sure, more computing power is probably better in almost any case - but if you don’t have billions of hours of video of people working in a factory or picking tomatoes or what have you, then is having that big of a supercomputer much of an advantage?

Totally OT, but it’s good to see you back on the boards, Captain.

2 Likes

Totally OT, but it’s good to see you back on the boards, Captain.

SpeyCaster, thank you!

The Fool community has been a welcome companion for over 20 years, entertaining and educational, and often just plain silly. It would be sad to lose it. Fortunately I stumbled on a way to avoid the “New, Improved, Crappy, Useless, Website Navigation Calamity.” I get emails of the posts that interest me. I read them in my mail client and occasionally I ‘Visit Topic’ to reply.

Win-win: Keep the community, avoid the website clutter.

BTW, I edit my posts and replies in my own Editor and save them on my computer.

The Captain

3 Likes