Imagine having to straighten up a messy kitchen, starting with a counter littered with sauce packets. If your goal is to wipe the counter clean, you might sweep up the packets as a group. If, however, you wanted to first pick out the mustard packets before throwing the rest away, you would sort more discriminately, by sauce type. And if, among the mustards, you had a hankering for Grey Poupon, finding this specific brand would entail a more careful search. MIT engineers have developed a method that enables robots to make similarly intuitive, task-relevant decisions. The team’s new approach, named Clio, enables a robot to identify the parts of a scene that matter, given the tasks at hand. With Clio, a robot takes in a list of tasks described in natural language and, based on those tasks, it then determines the level of granularity required to interpret its surroundings and “remember” only the parts of a scene that are relevant. In real experiments ranging from a cluttered cubicle to a five-story building on MIT’s campus, the team used Clio to automatically segment a scene at different levels of granularity, based on a set of tasks specified in natural-language prompts such as “move rack of magazines” and “get first aid kit.” The team also ran Clio in real-time on a quadruped robot. As the robot explored an office building, Clio identified and mapped only those parts of the scene that related to the robot’s tasks (such as retrieving a dog toy while ignoring piles of office supplies), allowing the robot to grasp the objects of interest.
Full research : A new method called Clio enables robots to quickly map a scene and identify the items they need to complete a given set of tasks.