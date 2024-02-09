After seemingly lurking on the sidelines most of last year, Apple is starting to shake things up in the field of artificial intelligence—and open-source AI in particular. The Cupertino-based tech giant has partnered with the University of Santa Barbara to develop an AI model that can edit images based on natural language, the same way people interact with ChatGPT. Apple calls it Multimodal Large-Language Model-Guided Image Editing (MGIE). MGIE interprets text instructions provided by users, processing and refining them to generate precise image editing commands. Integrating a diffusion model enhances the process, enabling MGIE to apply edits based on the characteristics of the original image. Multimodal Large Language Models (MLLMs), which can process both text and images, form the foundation of the MGIE method. Unlike traditional single-mode AIs focusing solely on text or images, MLLMs can process complex instructions and work in a wider range of situations. For example, a model may understand a text instruction, analyze the elements of a specific photo, then take something out of the image and create a new picture without that element. To perform these actions, an AI system must have different capabilities, including generative text, generative image, segmentation, and CLIP analysis, all in the same process. The introduction of MGIE brings Apple closer to achieving capabilities akin to OpenAI’s ChatGPT Plus, enabling users to engage in conversational interactions with AI models to create customized images based on text input. With MGIE, users can provide detailed instructions in natural language—”remove the traffic cone from the foreground”—which is translated into image editing commands and executed.

