multimodal AI

9 stories tagged multimodal AI.

Hand holding a sci-fi book cover titled "Developers API" with colorful spaceships against wooden background, Gemini Omni…

Google Gemini Omni Flash Opens API Access

Google's Gemini Omni Flash is now available via API, bringing conversational video editing and multimodal inputs to developers. Here's what it can and can't do.

Woman with brown hair in front of AI architecture diagrams showing attention mechanisms and MoE layers, with AI Engineer…

Google's Gemma 4 Makes Powerful AI Run on Your Phone

Yuki Okonkwo3 months ago

AI. Yuki Okonkwo3 months ago

Google's Gemma 4 Makes Powerful AI Run on Your Phone

Gemma 4 brings multimodal AI models to phones and laptops with clever architecture tricks that make 5B parameters perform like much larger models.

AI Agents Are Getting Persistent—And That Changes Everything

Yuki Okonkwo4 months ago

AI. Yuki Okonkwo4 months ago

AI Agents Are Getting Persistent—And That Changes Everything

Anthropic's Conway, Z.ai's GLM-5V-Turbo, and Alibaba's Qwen 3.6 Plus signal a shift from chatbots to AI that stays active, sees screens, and actually works.

OpenAI presenter on stage beside a futuristic AI head with glowing circuitry and green neon accents, introducing Symphony…

AI Agents Move From Chatbots to Actual Work: What Changed

Rachel "Rach" Kovacs4 months ago

AI. Rachel "Rach" Kovacs4 months ago

AI Agents Move From Chatbots to Actual Work: What Changed

OpenAI's Symphony, Xiaomi's MClaw, and Microsoft's Phi-4 Vision represent a shift from AI assistants to autonomous agents that complete real tasks.

Three stylized robots with Google logos hold various tools against a colorful gradient background with sparkles and…

Google's Gemini 3.1 Pro: When Benchmark Wins Stop Mattering

Bob Reynolds5 months ago

AI. Bob Reynolds5 months ago

Google's Gemini 3.1 Pro: When Benchmark Wins Stop Mattering

Gemini 3.1 Pro tops AI benchmarks, but the real story is cost efficiency and multimodal capabilities—not another 'world's most powerful model' claim.

Google's Lyria 3 Makes AI Music From Text (And Images)

Yuki Okonkwo5 months ago

AI. Yuki Okonkwo5 months ago

Google's Lyria 3 Makes AI Music From Text (And Images)

Google's Lyria 3 generates custom music from text, images, and video in seconds. Built into Gemini, it's multimodal, free, and targeting creators.

Alibaba announces Qwen 3.5 AI model with glowing white text on a dark purple digital landscape with geometric patterns

Alibaba's Qwen 3.5: Testing the Open-Source Model

Yuki Okonkwo5 months ago

AI. Yuki Okonkwo5 months ago

Alibaba's Qwen 3.5: Testing the Open-Source Model

Alibaba's Qwen 3.5 promises to rival Opus 4.5 and Gemini 3 Pro. We break down what the 397B parameter model actually delivers in real-world testing.

Code editor displaying Kimi AI interface with pink "KIMI K2.5" text and "RIP OPUS 4.5?" overlaid, showing model comparison…

Kimi K2.5: Open-Source AI That Packs a Punch

Tyler Nakamura6 months ago

AI. Tyler Nakamura6 months ago

Kimi K2.5: Open-Source AI That Packs a Punch

Explore Kimi K2.5's multimodal features and agent swarm tech, offering high performance at a fraction of the cost.

Ernie 5.0: Baidu's Bold AI Leap Forward

Marcus Chen-Ramirez6 months ago

AI. Marcus Chen-Ramirez6 months ago

Ernie 5.0: Baidu's Bold AI Leap Forward

Explore Baidu's Ernie 5.0, a new AI model challenging GPT-4 with its multimodal capabilities and cost-effectiveness.