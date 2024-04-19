Microsoft Corp. has published a research paper that introduces a new kind of artificial intelligence framework that makes it possible to upload a still photo, add a voice sample and create a super-realistic talking head that looks and sounds like the real person. The new framework is called VASA-1, and it takes a single, portrait style image and an audio file and merges them together in such a way that it can create a short video of a talking head with realistic facial expressions, head movements and even the ability to sing songs in the uploaded voice. Microsoft said VASA-1 is currently only a research project, so it’s not making it available for anyone else to use, but it posted a number of demonstration videos with dazzling realism. Although Nvidia Corp. and Runway AI Inc. have both released similar technology, VASA-1 seems to be able to create much more realistic talking heads, with reduced mouth artifacts. The company said the new framework is specifically designed for the purpose of animating virtual characters, and so all of the individuals in its examples are synthetic, generated using OpenAI’s DALL-E image generating model. However, it clearly has the potential to go further, because if it’s possible to animate an AI image, it should be just as easy to animate a photo of a real person. In the demo, the talking heads appear to be real individuals that were filmed, with smooth, natural-looking movements. The lip-sync capabilities are especially impressive, and it’s very difficult to discern any unnatural-looking movements. Equally impressive is that VASA-1 doesn’t seem to require a traditional, face-forward, passport or portrait style image to work. In the examples there are shots of heads facing in slightly different directions. The model also offers a high level of control, using things such as eye gaze direction, head distance and even emotional expressions as inputs, adding to the realism.

Full report : Microsoft researchers introduce VASA-1, an AI model that can create a realistic talking face video from a portrait photo and an audio file, in research preview.