ByteDance, the dad or mum firm of TikTok, has launched a brand new synthetic intelligence model known as OmniHuman-1. This model is designed to generate realistic videos utilizing images and sound clips. The event follows OpenAI’s choice to develop entry to its video-era software, Sora, for ChatGPT Plus and Professional customers in December 2024. Google DeepMind additionally introduced its Veo model final 12 months, able to producing excessive-definition videos based mostly on textual content or picture inputs. Nevertheless, neither OpenAI nor Google’s fashions, which convert images into videos, are publicly out there.
A technical paper (reviewed by the South China Morning Submit) highlights that OmniHuman-1 specialises in producing videos of people talking, singing, and transferring. The analysis group behind the model claims that its efficiency surpasses present AI instruments that generate human videos based mostly on audio. Though ByteDance has not launched the model for public use, pattern videos have circulated on-line. One in all these showcases a 23-second clip of Albert Einstein showing to provide a speech, which has been shared on YouTube.
Additionally learn: Amazon to launch AI-powered Alexa on February 26- This is what we all know up to now
Insights from ByteDance Researchers
ByteDance researchers, together with Lin Gaojie, Jiang Jianwen, Yang Jiaqi, Zheng Zerong, and Liang Chao, have detailed their strategy in a latest technical paper. They launched a coaching technique that integrates a number of datasets, combining textual content, audio, and motion to enhance video-era fashions. This technique addresses scalability challenges that researchers have confronted in advancing related AI instruments.
Additionally learn: Google says business quantum computing will take off in simply 5 years: What it means
The analysis highlights that this technique enhances video era with out immediately referencing competing fashions. By mixing various kinds of knowledge, the AI can generate videos with diversified side ratios and physique proportions, ranging from shut-up pictures to full-physique visuals. The model produces detailed facial expressions synchronised with audio, together with pure head and gesture actions. These options might result in broader functions in varied industries.
Additionally learn: ChatGPT maker OpenAI now has a brand new brand to match its rebranding. That is what it seems to be like
Among the many pattern videos launched, one contains a man delivering a TED Discuss-type speech with hand gestures and lip actions synchronised with the audio. Observers famous that the video intently resembles an actual-life recording.
Source link
#ByteDance #unveils #OmniHuman1 #model #generate #realistic #videos #images #Details