I gave you my vision, what’s yours?
The Captain
I gave you my vision, what’s yours?
The Captain
Me? I don’t think it’s at all feasible.
Tesla’s theory of the case for autonomous driving is that they have a unique advantage in having compiled a massive store of training data. I mean, truly massive - orders of magnitude more training data than anyone else. They had millions and millions of miles of driving video. They have also built a massive computer that’s capable of processing a database that size.
But they don’t have anything like that for training humanoid robots. There’s no large database of training data that they can feed into Dojo to train these things. Unlike with the cars, Tesla doesn’t have any plans for obtaining a large database of training data. Just hanging out in Tesla factories won’t give them enough training data to learn how to do things - you’ll need something of similar magnitude to their driving database. You need Big Data - and you can’t get Big Data from a dozen or so car factories.
What you probably would need is something like a few tens of thousands (or hundreds of thousands) of camera/mike inputs spread throughout most of human society. That’s the only way you’d get the millions of human-hours of observable behavior you’d need to train your robots just on passive video data. Tesla doesn’t have that, doesn’t have plans to get that, and probably has no way to overcome the many barriers to collecting that kind of data that way.
Get back to me in 5 or 10 years. 20 years ago Tesla didn’t even have a car, just a dream. Now…
How did that happen?
The Captain
They built many hundreds of thousands of cars without autonomy, and used their products to collect massive amounts of digital data. They had a plan to collect the large dataset necessary to support their AI driver initiative. Which they started implementing many, many years ago (since at least 2016).
What’s the analogous plan for collecting the data necessary to support a Big Data learning process for Optimus?
You are going to have to ask Elon Musk that. I don’t have any inside information but their track record looks OK to me.
The Captain
I mean, you can just assume that if Musk says it will happen, it will happen. Though some of his ideas work (SpaceX, Tesla), some don’t (Boring Co., Hyperloop, Solar Roof).
But if you’re talking about robots being an integral part of Tesla’s future, I would think you’d want to at least be able to articulate, at the most general level, how they might possibly come to be. The way someone in 2016 could at least lay out a roadmap for how full-self driving might one day be developed.
Do you have any idea how that might happen for Optimus? Has Tesla or Musk given any hint how they might develop the dataset necessary for Optimus training?
I already did.
By working at the giga factories.
The Captain
Incrementally. Baby steps.
In the early days of Tesla assembling cars the intention was that it would be ALL done robotically. That is, with the sort of specialized industrial robots we’ve all seen in videos. It turned out that just won’t work with some stuff, most clearly wiring harnesses. They flop around. Standard industrial robots can’t deal with things that flop around. So those sorts of tasks were re-engineered to be done by humans.
That also led to the question, what sort of robot would it take for such work? One with human dexterity and adaptability should be able. Generalize that, and you get Optimus. I imagine some of the first baby steps for Optimus will be simple things that are part of car assembly, but that’s just the first drop in the bucket. It wouldn’t surprise me if Tesla does in fact already have lots of data from watching people do jobs like that, and they can equip their assembly lines with all the cameras they want when they need it.
Could Optimus be a dead end? Yes, of course. Would an Optimus “failure” provide Tesla with experience nobody else has? I think we can count in that. Would Tesla make use of whatever they learn in ways that surprise us? That seems pretty likely to me.
But working at a giga factory can’t generate Big Data. You can’t put tens of thousands of robots into a handful of gigafactories - and even if you could, most of their observations would be duplicative.
That’s completely different than the dataset that Tesla was able to assemble for self-driving. They had literally hundreds of thousands of vehicles driving all over the country and internationally - encountering a diverse array of different environments. That gave their software massive literally billions and billions of driving decisions and environmental situations to analyze.
You can’t do that putting a few hundred robots in your factories. It’s just not possible to build up a big enough dataset. You can’t get the human-behavior equivalent of billions of roadway miles.
To train Optimus they need observed data of humans, not robots. There are plenty of those around, so while there might not be that mountain of data today, there are plenty of opportunities to gather it.
Really. What are they?
Tesla had plenty of opportunities to gather data about driving behavior - because they were selling a hundred thousand cars a year at that point. Because they were making all those cars, they had the ability to put data collection capacity into them. Because nearly all driving behavior takes place in places that cars can go, putting data collection onto enough cars is all you need to do in order to obtain a pretty massive and fairly comprehensive database on driving behavior.
But Tesla can’t do that with humans. The only humans it has the ability to monitor today are their employees in their factories. They don’t have permission to start monitoring millions of people the way they are able to do with millions of miles on the roads - nor do they have the devices installed that can do that. Tesla doesn’t really make consumer products that are amenable to collecting massive amounts of surveillance data (they don’t make things like smarthome devices).
If all you’re doing is trying to make a working FluffBot - or a Wiring HarnessBot - you might be able to capture enough data to train an AI to perform that single, defined task. But you certainly wouldn’t need or want to build a robot in a humanoid form if all it was going to do was that one job 24/7 - you’d build it to stay in place and just do that job.
So what’s the path to Big Data for a general purpose humanoid robot?
Albaby what if you are wrong. I think you were arguing that Tesla was not the leader in FSD and now it seems you have conceded that point
Andy
It’s entirely possible I’m wrong - but how, exactly? One could easily foresee that if Tesla had a few hundred thousand cars collecting data on highways that they’d have a few billion miles worth of data in a diver. But if you stick monitors in all the Tesla factories, you’re only going to be able to monitor a few tens of thousands of humans - and they’ll only be doing the sorts of things one finds in an auto factory. That just can’t build an analogous Big Data set the way that they did for driving.
No, not really. I question whether collecting a massive amount of video driving data is a sufficient condition for developing a Level 5 autonomous driving AI, at least with current technology. I don’t think I’ve ever disputed that Tesla has collected a massive amount of video driving data.
That is recording how humans drive. In the factory it is recording how humans do their jobs. Want to train robots to work in hospitals? Record what nurses do. It’s all the same. The reason to start at the Tesla factories is because they are Tesla controlled environments.
Guess who is testing out Tesla semis! Pepsi, an interested customer. Once the Optimist project is advanced enough there will be interested parties, the so called early adopters, willing to work with Tesla to get a head start on their competition. You might want to study the Technology Adoption Lifecycle {TALC).
Technology Adoption Lifecycle {TALC - Google Search)
In technology there is a group of intermediaries called Value Added Resellers (VARs) that have specialties. I can see some VARs recording their activities, sending the data to Tesla to process with Dojo and then reselling specialized robots to their customers. But first Tesla has to develop the model and there is no better place than at their own factories.
What is a Value-Added Reseller? Definition from SearchITChannel.
The Captain
Well it certainly isn’t standing around in a factory “observing” humans. With cars they weren’t “observing” humans, they were recording human actions at the electro-mechanical level, that is taking the exact data from a combination GPS, radar, lidar, video inputs, brake pedals, accelerator, windshield wipers, and every other button or device the human touches .How are you going to do that in a factory? Nursing home? Fast food restaurant? Wire the people up with Tom Cruise/Iron Man suits to record how much pressure to break an egg, how far to swing to wash a plate, how the fingers hold a wire harness? There is so much you can’t do “by watching”, and if it isn’t “watching” then someone has to actively train, so unlike self-driving there’s no free ride (excuse the pun.)
Wait another 30 years until computer power catches up with human brain power. Yes, there’s that long to go. Then put them in a play pen with a lot of YouTube videos and let them have at it. Eventually the monkeys will write a Shakespearean play or something like it.
SpaceX has demonstrated an ability to launch rockets, for sure. The economic case for the satellite internet service is not going well. Starlink, for example, ended the year with $1.4B in revenue. A few years ago Musk predicted they would have $15B in revenue and $7B in profit by now.
Starlink hasn’t signed up customers as quickly as SpaceX had hoped. Toward the end of last year, Starlink had [more than one million] active subscribers, SpaceX has said. The company thought its satellite-internet business would have 20 million subscribers as 2022 closed out, according to SpaceX’s 2015 presentation.
Without governments there is no SpaceX, and certainly no Starlink.
Back to robots: they’re pretty good at certain tasks - with enough training - and a small, defined task to accomplish. Scaling up to semi-human capabilities is a jump of many orders of magnitudes. I expect it will someday be accomplished, but I also suspect it will be many years from now. Lots of many’s, there.
Except it’s not quite the same. It’s relatively simple to add data and video collection to a car so that you can record what the drivers do. After all, you made the car, and the marginal expense to add the recording capability is modest. As Goofy pointed out, you’re also getting information far beyond just the video and audio - you get all the information the car is generating, so that your AI doesn’t have to figure out how to determine the right amount of acceleration or braking just by watching a human’s foot on the pedals. Since you’re selling 100K (and increasing) cars per year as of late 2016, you get to “Big Data” scale within a few years.
But how does that work for humanoid robots? You’re not going to build 100K humanoid robots to walk around your factories monitoring employees - the expense is too great, and there just isn’t room. Your robots aren’t doing the jobs, so while they can watch what the humans do, they get no data on how hard the humans are gripping or what forces are being exerted on the objects that are being manipulated - they’re just not getting the same range of data that comes from a car being manipulated by a human.
You’re just not going to get the scale of data that you need to have for a massive supercomputer like Dojo to be useful. Not from just taking video of your employees.
Obviously you are not understanding the process. Just because Tesla uses cars to record how humans drive it does not mean that Optimus robots must record how humans assemble cars.
It can be done with just one robot, no need for 100K humanoid robots to walk around your factories monitoring employees
The Captain
Let me give you an example that is in place today.
At a prior employer, we had a process to apply a polymer to a very complex surface. A surface so complex that humans could not be reliable in doing it unless they were well rested, extremely experienced, took extreme pride in their work AND set up the application process 100% correctly.
Typical results for this process was a 17% rejection rate, a 61% rework to acceptable rate and incessant (for good reason!) complaining that “we can’t do this”.
We worked with a company to develop a robot that would get the set up correct:
That would then be capable of executing that process 100% of the time with 0 rejects and <5% rework content.
The secret to that success was that the robot “followed” the best of the best operators. Literally, it was indexed to the human hands of our very best employees on their very best days. After the robot learned the tool path, we deployed manufacturing engineers to identify small tweaks to the process to improve the application pattern, overlap, and 3x coverage that our process required.
In this example, the automation system “learned” with machine recording, human-machine interfaces with sensor identified position, signaling and control, then a bunch of human neural nets with heuristic rules for spray pattern depth, atomization and impingment angles optimized it.
I have zero doubts that an AGI robot could take over most of the optimization and learning without much human interface after a VERY brief introduction to the process if (IF!) the results were fed back into the process in a very helpful way.
Now, think about the “robot” as
a material mover - like a cart
a material positioner - like a person grabbing the switch and control assemblies from a rack of parts to place into a vehicle or appliance
a tooling changer - like a person operating a crane and carrying a set of dies or molds
a QC inspector - like a person looking for anomaly then moving material deemed “scrap” to a quarantine area.
Now think about that robot doing all of those things together with training.
It’s not that we don’t have many aspects of this in place through automation projects today, we do.
It’s that with simple instruction, we can SHORTEN the learning curve and the time to “automation” run times dramatically if we can “show” the process control and interaction system what good looks like.
Neural nets can be taught in many ways, teaching them in 3D to identify and take action on scenarios simply requires humans to set the definition of task and “good”. After that, the feed back loop takes over and allows for optimization and enhanced performance.
Let me give you another example.
Boatwrights spend an enormous amount of time laying out components and placing items into a hull for any new production model in mass production. The most experienced technicians are often working with the engineers and designs WHILE the model is being designed and built.
At this time they are defining (and redefining) where cables and ducting should route, how water will be handled from collection through drainage and down into bilge areas for processing and pump out, how controls and access panels will be laid together for assembly, servicing and aesthetics and the order of operations to make it work on a production line. (and on and on…)
There is a learning process going on. The difference between this in the prototyping part to pilot production part of the process is that an AGI machine with capabilities in the lists above is likely going to instantly transform the entire set of “workers” for every sub optimized improvement as the design goes along.
We humans can focus on changing the operating rules and let the machines actually do the operating.
My material mover knows what to carry, where to get it, where to deliver it, when to pick it and knows that to get to the drop location on time, it will have to take an alternate route (just like a human)
My assembler just got an OTA update and knows that it’s now slightly better to change the sequence of switches to install into the panel because the access panel revision just changed and the proto - shop robot retooled the process instructions.
My die changer just switched from prioritizing the exchange of die 817703 for 988801 to moving material to cell 2B on line 16 because that process is now 2 seconds behind tact.
My inspector also came over to help reduce delay because faulty material was just produced due to a worn tool.
There are no humans in this process.
No, that can’t be done with one robot.
Dojo, and similar systems, operate by having massive amounts of data to develop their models from. You can’t just input the film of one human doing a job. Or ten humans doing that job. Or a hundred humans doing that job. They need massive amounts of data in order to tease out patterns and rules from countless subtle variations in the visual images.
Lets use the wire harness example. Dojo doesn’t know what a wire harness it - or that it’s distinct from a human or the car body it’s being installed on. It can’t recognize one in a photo or video. But if you show it a million videos of car harnesses being installed in cars, it can gradually develop patterns and rules that allow it to “know” that there’s an object being installed into another object by a third object. It doesn’t know that those objects are “harness,” “car body” or “human” - it’s just seen enough different variations to know that one pattern of pixels “corresponds” to something that usually is moved by another pattern of pixels into another position in the third pattern of pixels.
You can’t train it with just a few instances of something. You need countless, enormous, gobsmacking amounts of data for it to be able to start recognizing those things. That’s why Dojo has to be so enormous, larger than most other computers on earth - to process the massive amounts of data involved.
Of course! This is not an argument that robots can’t do incredibly elaborate and complicated tasks! Or that machine learning can’t be part of it!
It’s simply to point out that Tesla’s stated goal with Optimus is not to build a task-specific machine. They are trying to build a general purpose humanoid robot. We’ve been “teaching” robots to be able to do a single, specific industrial task for several decades now - designing a general purpose robot is a completely different animal.
If you took all the Tesla wire harness installers and fitted them with haptic sensors and monitors (so that Dojo was getting data not just visually but understanding exactly how much force, pressure, tension, and pull was being applied at every single moment of the installation process) and did that over and over and over again, you might be able to get a dataset large enough that, in connection with human programmers and such, you could design a robot that could do that job. Maybe not - there are very good reasons why that’s one of the jobs that hasn’t been previously automated, because the “floppy” harness results in so many countless possible variations in location and orientation that it’s hard to handle. But from just having one robot do nothing but record video of that worker for a little while, and then use that to “teach” a general purpose robot?