Skip to main content
AI Socratic

DeepMind also showcased Gemini Omni, a native multimodal model built to seamlessly parse and generate any combination of text, audio, and video inputs. The big hook here is video-to-video editing: users can modify video details, adjust cinematic styles, or swap background objects using natural conversational prompts. The first model of this family, Gemini Omni Flash, dropped immediately for developers via API and across consumer products like the Gemini app and YouTube Shorts.

Sources: DeepMind Gemini Omni, Google I/O Keynote Video

React:

Comments

Sign in as a member to join the conversation.

Loading comments…

Stay Updated

Get the latest AI insights delivered to your inbox. No spam, unsubscribe anytime.