Start your day with intelligence. Get The OODA Daily Pulse.
Foundation models have made great advances in robotics, enabling the creation of vision-language-action (VLA) models that generalize to objects, scenes, and tasks beyond their training data. However, the adoption of these models has been limited due to their closed nature and the lack of best practices for deploying and adapting them to new environments. To address these challenges, researchers from Stanford University, UC Berkeley, Toyota Research Institute, Google Deepmind, and other labs have introduced OpenVLA, an open-source VLA model trained on a diverse collection of real-world robot demonstrations. According to the researchers, OpenVLA outperforms other similar models on robotics tasks. Furthermore, it can easily be fine-tuned for generalization in multi-task environments involving multiple objects. And it has been designed to take advantage of optimization techniques to run on consumer-grade GPUs and be fine-tuned at a very small cost. With foundation models becoming a cornerstone of robotics, OpenVLA can make these models more accessible and customizable to a broader range of companies and research labs. Classic learned policies for robotic manipulation struggle to generalize beyond their training data. They are not robust to scene distractors or unseen objects, and they struggle to execute task instructions that are slightly different from what they have been trained on. LLMs and VLMs are capable of these types of generalization thanks to the world knowledge they capture from their internet-scale pretraining datasets. Research labs have recently started using LLMs and VLMs as one of the building blocks for training robotic policies.
Full report : OpenVLA is an open-source generalist robotics model.