Google is helping robots accomplish tasks quickly and more efficiently, using technology like the systems powering AI chatbots Bard, ChatGPT, the Claude 2 and others, the company said in a blog post Friday.
Google’s Robotics Transformer 2, or RT-2, is a “first-of-its-kind vision-language-action (VLA) model,” Vincent Vanhoucke, the head of robotics for Google DeepMind, said in the post. Similar to the large language models behind AI chatbots, it trains based on text (and image) data found on the web, to “directly output robotic actions.”
Vanhoucke said getting robots to use AI to understand the world around them is more difficult than what goes into chatbots. Whereas an AI chatbot just needs to absorb a bunch of text data about a particular subject and be able to arrange that information in a manner that’s easy for humans to understand, a robot literally needs to understand the world around it. It’s one thing to recognize an apple. It’s another thing to distinguish between a Red Delicious apple and a red ball and pick up the correct object.
With the launch of OpenAI’s ChatGPT late last year, there’s been a rush of companies bringing AI tech to market. AI chatbots are already seeping into coding, the college application process and dating apps. Google itself is making artificial intelligence a central focus of its business — as evidenced by the fact that company presenters said “AI” more than 140 times during Google’s two-hour keynote event at its I/O developer conference in May.
Robotics is just another field in which AI models could change how quickly technology gets smarter. And for Google’s investors, the company’s advances in robotics could make for good business. The industrial robotics industry is currently valued at $30 billion and is expected to reach $60 billion by 2030, according to Grand View Research.
In the past, engineers looking to train a robot to, say, throw away a piece of trash would first have to train the ‘bot to identify the trash (which involves a lot of parameters), bend down, pick it up, lift it up, bend back, identify a trash can, move its robotic arm over the can and drop the trash. It was, as you may’ve guessed, a slow and monotonous process. Google says that with RT-2, which pulls from troves of image data found online, robots can quickly be trained to understand what trash is and how to pick it up and throw it away.
A robot can use a small amount of training data to “transfer concepts embedded in its language and vision training data to direct robot actions — even for tasks it’s never been trained to do,” said Vanhoucke. In a demonstration given to The New York Times, a robot was able to identify and lift a toy dinosaur when asked to pick up an extinct animal from among a group of other toys. In another challenge, the ‘bot was able to pick up a toy Volkswagen car and move it toward a German flag.
Editors’ note: CNET is using an AI engine to help create some stories. For more, see this post.