Quinn 3.5 Runs AI Models On Your Phone Without Internet

You're on a plane. Your kid is melting down. You need advice, but you're at 30,000 feet with no internet. Here's what privacy-first AI actually looks like in that scenario.

Tech YouTuber Matt Wolfe demonstrated the Locally AI app running Qwen 3.5—a brand-new open-weight language model that processes everything on your iPhone. No cloud. No OpenAI servers. No data leaving your device. The model dropped March 2nd in four sizes (800 million to 9 billion parameters), and Wolfe put the middle-tier versions through practical tests.

The results were... complicated. Which is exactly what makes them interesting.

What Actually Works

Wolfe tested the 2 billion and 4 billion parameter versions of Qwen 3.5 on his iPhone 17 Pro. Basic tasks performed surprisingly well. Brainstorming YouTube video ideas while in airplane mode generated dozens of suggestions—some silly ("Could your cat control your doorbell with ChatGPT?"), some genuinely useful. The model handled visual tasks too, analyzing a drink photo to assess whether it was healthy.

The parenting scenario worked exactly as advertised. With Wi-Fi and cellular disabled, Wolfe asked: "My kid is throwing a fit because I took his iPad away. How should I calm them down?" The model generated a detailed, thoughtful response while completely offline.

"This is not sending any information to the cloud whatsoever," Wolfe noted. "It is operating completely on your phone."

Speed varied by model size. The 2 billion parameter version responded quickly. The 4 billion parameter model (recommended for iPhone 15 Pro or newer) ran slower but offered more sophisticated responses. Both versions noticeably warmed the phone during extended use—a reminder that AI inference requires actual computational work.

Where It Falls Apart

Logic problems exposed the model's limitations immediately. Wolfe asked the classic trick question: "If there's a car wash 200 meters from my house, should I walk or drive to get there?"

The model went through elaborate calculations about travel time and convenience factors before eventually acknowledging the obvious problem: "If your car wash requires your car to be present, then walking becomes a practical necessity, not a theoretical choice." This is the kind of reasoning failure that makes you realize we're still working with probability machines, not actual thinking.

The app also showed performance degradation as conversations grew longer. "As this chat gets longer, you can probably even tell from my video that it's actually starting to get a little bit choppy," Wolfe observed. Context management requires memory, and phones have limited RAM compared to server farms.

The Privacy Calculus

Here's where security and privacy concerns get real. On-device processing means your prompts never touch OpenAI's training datasets, Anthropic's servers, or Google's infrastructure. For certain use cases—medical questions, financial planning, sensitive work discussions—that matters enormously.

But "local" doesn't automatically mean "secure." The Locally AI app itself could theoretically log prompts for later upload, though nothing in Wolfe's testing suggested it does. The developer, Adrian Gronden, built this as a privacy-focused tool, but users should understand the difference between architectural privacy (on-device processing) and implementation privacy (what the app actually does with your data).

The bigger privacy question: Is this level of capability worth the trade-offs? Qwen 3.5 benchmarks roughly equivalent to GPT-4 Nano and performs "better than the most state-of-the-art model we had, you know, a year and a half, two years ago," according to Wolfe. That's genuinely impressive for a phone. It's also definitively worse than current cloud-based models like GPT-4 or Claude Opus.

So you're choosing between:

Better performance + sending your data to Big Tech
Worse performance + keeping everything local

That's not a clear answer. It depends entirely on your threat model.

What This Actually Means

The Locally AI app is free and works on iPhones from the past four to five years, depending on which model size you choose. The 800 million parameter version runs on iPhone 14 or newer. The 4 billion parameter version needs an iPhone 15 Pro minimum.

Setup takes about five minutes for the initial model download. The app includes custom instructions, temperature controls, and a Siri shortcut for voice activation. Wolfe noted the interface is clean and the experience relatively polished for such new technology.

What fascinates me about this development isn't just the technology—it's the inflection point it represents. We're reaching a threshold where "good enough" AI can run entirely on consumer devices. Not cutting-edge AI. Not the models that make headlines. But AI that's better than what most people had access to just 18 months ago, running with zero internet dependency.

That changes the baseline. It means the floor for AI capability is rising rapidly even as the ceiling (cloud-based models) continues to stretch higher. It means privacy-conscious users no longer have to choose between AI assistance and data sovereignty—they can have both, with performance trade-offs they can actually evaluate.

Wolfe's testing was deliberately low-stakes: parenting questions, brainstorming sessions, casual queries. Those are exactly the use cases where on-device AI makes sense. You don't need GPT-4's reasoning power to generate video ideas. You do need to know your brainstorming session isn't training someone else's model.

The question isn't whether on-device AI will replace cloud services. It won't, because physics. The question is whether the gap between them will narrow enough that privacy-first AI becomes the default choice for everyday tasks. Based on Qwen 3.5's performance, we're closer to that threshold than I expected.

Rachel 'Rach' Kovacs is Buzzrag's cybersecurity and privacy correspondent.