Why don’t we have AI-powered robot butlers yet? An investigation.
In a very telling interview at Davos earlier this year, Bill Gates spelled out who AI is meant to help: “It is so dramatic how it improves white collar productivity. And later, with the robotics — not yet — but eventually, blue collar productivity,” Gates told Bloomberg. AI that can make you a hundred times faster at writing emails? Your wish is Big Tech’s command. AI that can build an entire car? Hold that thought. They’re working on it.
But if you’re like most people — meaning nothing like Bill Gates — the biggest productivity suck of all is your endless list of chores. And in this regard, AI is decidedly not coming to our rescue anytime soon, even if the business world is hyping up humanoid robots like crazy right now.
For instance, a startup called Figure released a demo the other day of what seems like the humanoid robot of my dreams: when asked for something to eat, it hands the user an apple, and it uses an integrated OpenAI large language model to explain why it made that decision while it puts some trash in a bin. Then it puts some dishes away.
Figure’s concept robot makes for an impressive demo, but unless this company has some truly unique engineering going on behind the scenes, it’s probably just a demo. The robot doesn’t walk, and sticks to a narrow and tightly scripted routine. This might be the mechanical chosen one, but probably not. Decades of demos like this have come and gone, and we still don’t have robots in our homes that actually pick up trash and do the dishes.
This notion has existed since the conception of “robots” as an idea — by which I mean the 1920 Czech play R.U.R. (Rossum’s Universal Robots), which coined the term “robot” in the first place. R.U.R.’s robots were humanoid figures, a.k.a. androids, meant to toil away while their human overlords chillaxed. But even though technology has accelerated to the point where we now have machines that can respond to simple written prompts with vibrant moving images of, say, fictional humanoid robots, or any other fantasy scenarios we care to conjure, physical robots only seem to bring joy to real-world humans if the human in question is named Jeff Bezos. Meanwhile, for average individuals, robots are mostly objects of frustration, if not outright fear.
As for literal robot servants to act as our in-house butlers, it’s begun to feel like that century-old idea needs an additional century to percolate down from concept to consumer reality.
To the world’s estimated 10,000 actual human butlers, that must feel like good news. AI automation seems to be jeopardizing a lot of gigs right now, so who wants to consign yet another category of flesh-and-blood people to the dustbin of permanent unemployment? At the risk of splitting hairs, though, come on: that’s just not what we’re talking about when we talk about the conspicuous absence of robot butlers. The world has about eight billion people, most of whom are plagued by chore loads that seem to only ever grow and never get finished (particularly if they are women).
If they were actually useful, robot butlers would be chore-killing appliances rather than snooty status symbols. I’m pretty confident the remaining butlers in the world — highly skilled managers of palatial estates who know which freshly polished rifle is for pheasants and which one is for foxes — would get to keep their weird antique jobs, even if Apple really did start manufacturing iJeeves.
Ideally, then, the robot butler revolution wouldn’t be an example of automation wrecking lives. It could instead be a true example of progress — technology for the people. And yet, there’s no sign of it anywhere.
Here are the reasons why:
Robots move… robotically
In 1988, Carnegie-Mellon roboticist Hans Moravec, writing in his book Mind children: the future of robot and human intelligence, stumbled upon a key piece of robot wisdom. A misconception by snobby software programmers at the time held that robots were clumsy because they were being built by troglodyte gearheads, and once intellectuals took over, robots would be performing brain surgery on their own in no time. However, he wrote, “it has become clear that it is comparatively easy to make computers exhibit adult-level performance in solving problems on intelligence tests or playing checkers, and difficult or impossible to give them the skills of a one-year-old when it comes to perception and mobility.”
This excerpt inspired what’s now known as Moravec’s Paradox: the idea, paraphrased from Moravec, that what’s hard for humans is easy for robots, and vice versa.
So while you may have seen plenty of footage of Boston Dynamics robots, such as the humanoid prototype Atlas, performing tasks with eerie precision, that’s just because you’re seeing the product of hours of rehearsals in which the robot botched something basic countless times before finally getting it right a single time while the camera was rolling. Boston Dynamics doesn’t hide this fact, by the way, but its videos of clumsy robots don’t go as viral — because they don’t prompt thousands of social media posts all making the same “we’re all gonna die” joke.
In short, even as we begin to imagine — and struggle to clearly define — “general artificial intelligence,” any AI that wants to be embodied in the physical world will still need to share the actual environments we humans inhabit, which include irregular and diverse surfaces and objects, occasional wetness, things with unevenly dispersed holes and protrusions in them, softness, mushiness, lumpiness, breakability, and crumbliness. This is good for anyone who worries about an AI apocalypse, but it’s bad for anyone worried about remembering to put laundry in the dryer while a toddler has a tantrum. As things stand today, the helpers and the hurt-ers will instantly be defeated by banana peels.
Robot arms and hands are built for fumbling
In a 1952 episode of I Love Lucy, Lucy and Ethel were given the prototypical factory job: picking up little chocolates with their hands, wrapping them in paper, and putting them back down on a conveyor belt. Only a real goofball could screw up something so basic.
But today’s robot hands remain hilariously clumsy, even in basic situations like this one. In a TEDx talk from earlier this year, UC Berkeley roboticist Ken Goldberg explains that robotic hands and arms have to deal with a multifaceted problem he reduces down to the word “uncertainty.” Robots, Goldberg says, are “uncertain” about their own controls, uncertain about what they can “perceive” with their onboard cameras, and uncertain about physics, meaning they’re forced to deal with “microscopic surface topography” that makes objects in the real world behave in totally novel ways even when seemingly all variables are removed (Try sliding your smartphone from one side of your desk to the other with one finger, and then imagine a robot trying to do what you just did).
Goldberg is partially using his TEDx talk to pitch his own robot company — designed to perform tasks almost exactly like Lucy’s — picking up diverse objects from bins in warehouses, scanning them, and putting them in smaller bins. It’s downright astonishing that robots narrowly targeted at such basic tasks remain so cutting edge.
A brand new paper by Stanford roboticist Cheng Chi and seven coauthors explains why, 72 years after Lucy’s candy factory job, robot hands are still even clumsier than a screwball comedian’s. The paper, a sort of open-source manifesto for robot builders is called, “Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots,” and it vividly describes today’s state-of-the-art tools for these sorts of tasks: simplified, viselike “grippers” trained by humans holding them like little puppets and performing tasks — things like picking up a chocolate, putting it in a wrapper, and setting it back down. Unfortunately, the paper notes, “While users can theoretically collect any actions with these hand-held devices, much of that data can not be transferred to an effective robot policy.”
The paper then goes on to provide a sort of open-source recipe for a better gripper training system, including a universal physical gripper anyone can make with consumer-grade tools. Cheng’s hope is that robot labs around the world can work together to build vast public datasets of “policy” for robots to follow, and with a little luck, the Universal Manipulation Interface (UMI) will take over the space, perhaps enabling robot hands in the near future to do, say, a tenth of what one Lucy can do with her human hands. Even that would be a colossal achievement.
Robot “thinking” is too rigid for the real world
“Err-or. Err-or.”
The idea that a robot will break (or explode) if slightly confused is a well worn trope known among trope aficionados as the “Logic Bomb.” It appeared five times in Futurama alone. The thing about logic bombs, though, is that they’re pretty close to robot reality.
In 2019, the East Coast supermarket chain Stop & Shop rolled out a line of robots that endlessly navigated the store purportedly scanning the floor for messes, and then… cleaning them up? Nope. It just sounds the alarm for a human employee to come fix the problem. This often meant the robot would just stall in an aisle for long stretches of time, emitting an audible “danger” alarm over a non-hazard like a single tissue or a lid on the floor. Employees reported finding it pretty useless.
In our age of generative AI, it feels particularly absurd that the latest models can differentiate photos of mutts from those of purebred dogs, or hold court about the intricacies of translating Proust, but physical robots are still limited to one or a handful of very basic functions, and they still constantly glitch out when trying to carry out the same basic tasks as a 20 year old Roomba.
A concept called “open-vocabulary” robot manipulation, however, is supposed to act as something like a bridge between Roombas and ChatGPT. The models can process natural language prompts into computer-friendly ones and zeroes. Those ones and zeroes can then turn all that information into robot commands. In theory.
But another brand new paper, “MOKA: Open-Vocabulary Robotic Manipulation through Mark-Based Visual Prompting,” by a four-person team led by Berkeley’s Fangchen Liu, describes the problem. The authors note that “large models pre-trained on Internet-scale data still lack the capabilities to understand 3D space, contact physics, and robotic control, not to mention the knowledge about the embodiment and environment dynamics in each specific scenario, creating a large gap between the promising trend[s] in computer vision and natural language processing and applying them to robotics.”
In the new, endlessly flexible system the authors propose, images are tied to the actual movements a given robot either requires – or is capable of – given its environment. Images are described with words, allowing the model to use vision not just to predict limitations and parameters for action, but to identify possibilities. That is to say, if the Stop & Shop robot were equipped with this system, it could identify a “hazard” like eleven spilled jellybeans, but then also something in the environment like a “broom,” capable of “sweeping” the jellybeans up. If it were equipped with some arms, and a nice pair of grippers, the possibilities would be endless. Again, in theory.
But that may not matter much, because…
The economics of robot butlers just don’t add up
To paraphrase a saying often attributed to sci-fi author William Gibson, the robot butlers of the future may arrive soon, but that doesn’t mean they will be evenly distributed.
The current crop of cutting edge consumer robots doesn’t leave me with much hope that I’ll ever be able to afford a functional robot butler. For example, products in the “Sanbot” line of robots from Qihan Technology can do some cool stuff, but they’re explicitly designed to replace retail and concierge workers and point of sale systems, and those are priced at around $ 10,000 (though I wasn’t able to find a retailer with a website I would rate as trustworthy). Practically speaking, Sanbot devices aren’t even functional workers though. They seem to be more like marketing gimmicks — a hi-tech inflatable tube man, essentially.
Meanwhile, a Segway Loomo, which is basically a smartphone attached to a miniature Segway scooter that can follow people around a stair-free environment would set me back $ 2,055.30 if I bought one right now on Amazon, which I currently feel no inclination to do. A Unitree Go2, which is a pretty amazing knockoff of Boston Dynamics’ robotic dog Spot — minus the all-important arm — would cost me $ 2,399. These are the closest things to butlers I can buy right now, but they can’t be of any real help around the house.
Goldman Sachs, for its part, predicted “a market of up to US$ 154bn by 2035 in a blue-sky scenario” for humanoid robots according to a 2022 report from the banking firm’s research department. Goldman also pointed out that “robot makers will need to bring down production costs by roughly 15-20% a year in order for the humanoid robot to be able to pay for itself in two years.” That’s for business robots, not butlers.
The point is that robots are way outside my price range, and seem like they will be for the foreseeable future. Roughly speaking, $ 10-20,000 seems to be the price range companies have in mind. At these prices, they’d better not chip my ceramics when they do the dishes, but if they truly crushed my household to-do list flawlessly, I might save up. I doubt I’m alone in that.
Still, the most depressing omen of all — and the one that might best sum up the whole state of robot butlers — is the fact that Elon Musk has a division of Tesla chipping away at a humanoid robot called Optimus. Musk says Optimus will cost $ 20,000, and at an event where he talked about his robots, he said “the robots will be able to do everything, bar nothing.” Considering the apparent truth value of the average statement from the richest man in the world, all of his promises about robots fill me with certainty that even way-too-expensive robot butlers are never going to arrive.