LLMs Are Ready for Robotics and Self-Driving, Says Ambarella

07/17/2024

ulti-modal LLMs are ready to take on advanced tasks in computer vision, autonomous driving and robotics, Ambarella CTO Les Kohn told EE Times. “I do believe that to go [to] the higher levels of autonomy, beyond L3 or to make L3 more robust, you need to include something with more general world knowledge, which can understand complex scenarios and predict what you should do, more like a human,” Kohn said, adding that LLM-style models including large generative AI models based on transformers would fit the bill, since they can handle complex real-world scenarios. Today’s models like Llava, which was trained primarily on text with visual input data added in the final stages, can associate concepts from its text training with scenes it sees in the real world, he said. “These multi-modal models can understand a lot more about a scene than a pure computer vision model which has no higher-level understanding of the way things work in the world,” he said. “[Llava is] much more able to deal with edge cases because it can infer things and generalize based on the other training it had.” Kohn said he is following recent research that has indicated that multi-modal models—despite being developed independently and trained on different data—are converging in the representations they use. “This seems to indicate these models are learning to predict the way the world really works,” he said. “They have some knowledge of rudimentary physics and the behavior of objects, similar to how humans understand the world.”

Full report : LLMs Are Ready for Robotics and Self-Driving, Says Ambarella.

Tagged: AI automation Autonomous Vehicles Large Language Models Multimodal AI Robotics

Subscribe Sign In

Related Posts