Building a Personal Agent OS: One Dashboard for AI

Here is a problem anyone who has spent serious time with AI tools will recognize: you have five browser tabs open, three different chat windows, a terminal running a CLI agent, and somewhere in your downloads folder, the output from a thing you built two weeks ago that you can no longer find. Every new conversation starts from scratch. Every new agent has no idea what the last one just did. You are, as Julian Goldie puts it in a recent video tour of his custom setup, "the glue."

Goldie's response to this fragmentation is what he calls an Agent OS—a custom-built dashboard that centralizes multiple AI agents, their outputs, their conversation histories, and their memory into a single interface. It's not a commercial product you can purchase off a shelf (though he does sell access to his setup through a paid community). It's closer to a personal operating system philosophy: the idea that the right way to work with AI in 2025 isn't to use individual tools, but to architect a system where those tools talk to each other and keep running when you step away.

The video is essentially a screen tour of what that looks like in practice, and it's worth taking seriously on its own terms before asking the harder questions.

What the system actually does

The centerpiece is a unified dashboard Goldie has built himself, populated with several distinct agents. Hermes Oracle monitors industry news on a schedule, ranks stories by relevance and trending weight, and automatically drafts social posts. Hermes Jarvis is a computer-control agent—it can open browsers, navigate pages, and execute tasks in real time based on plain-language instructions. A feature called the "memory galaxy" visualizes an Obsidian vault where every agent conversation and news item is automatically archived, creating persistent context that any agent in the system can draw on later.

The memory piece is probably the most architecturally interesting part. Most people working with AI tools today experience what amounts to institutional amnesia—every session is session zero. The approach Goldie describes, routing agent outputs into a local knowledge base that then feeds back into future agent prompts, is a meaningful attempt to solve that. The Obsidian vault acts as shared long-term memory across the system. When Hermes Oracle ingests a news story, it lands in the vault. When Hermes Jarvis finishes a task, that conversation lands in the vault too. The agents, in theory, know what happened yesterday.

This connects to a broader pattern in how overnight agent systems are being built—scheduled, autonomous loops that do the monitoring and research work so you don't have to. The difference in Goldie's framing is the integration layer: rather than a single scheduled agent doing one job, everything feeds into the same memory store and can be orchestrated from the same screen.

There's also a loop engine—a builder-judge architecture where one model does the work and a separate model evaluates it against a threshold (he uses 80 out of 100 as his example). If the output doesn't clear the bar, the loop runs again with feedback. Goldie's articulation of why this matters is worth quoting directly: "The future, from what I'm seeing right now, is not prompting. It's not about better prompts. It's about building loops that run without you so that your systems can just flow and improve."

Then there's "Paperclip"—a team of agents with a designated CEO agent that assigns and coordinates work across the group. The outputs Goldie shows include live websites and multi-minute AI avatar videos generated from a single topic prompt. Whether those outputs are good is a separate question from whether the architecture is interesting. The architecture is interesting.

The model-swapping argument

One feature Goldie emphasizes is the ability to swap underlying AI models in and out without rebuilding anything. When a frontier model gets deprecated or outperformed on benchmarks, you pull it and insert the new one. His example: a new model called Sakana dropped the day before the video was recorded, and within an hour it was integrated into the system and producing outputs.

This matters more than it might sound. One of the hidden costs of building workflows around any specific AI product—ChatGPT, Claude, whatever—is lock-in. You optimize your prompts, your integrations, your team's habits, around a particular model's behavior. When that model changes (and they all change, often without warning) your workflows break in unpredictable ways. An abstraction layer that treats models as swappable components rather than fixed dependencies is a genuine engineering insight, not just a convenience feature.

It also reflects something true about the current moment in AI development: no one model is reliably dominant for more than a few months. Building around model-agnosticism is probably correct.

Where the skepticism earns its keep

None of this means the video should be consumed uncritically. A few tensions are worth naming.

First, the performance metrics. Goldie claims this system reduces weekly "AI management time" from 15 hours to roughly 3. That's a specific and significant number—a claimed 80% reduction in overhead. But it's self-reported, unverified, and measured against a baseline ("scattered tools") that he also defines. The comparison is doing a lot of work. Someone whose current workflow is already reasonably organized might see much smaller gains. Someone building this from scratch might find the setup cost—which he describes as hours of daily work over an extended period—exceeds the savings for a long time.

Second, the "non-technical" accessibility claim. Goldie repeatedly says you don't need to be a coder to build or use this. He cites a 61-year-old community member named Ethan who got eight agents running with voice control and Obsidian installed. That's a genuine data point. But the gap between "someone in a paid community with tutorial support can set this up" and "anyone can do this" is wider than the framing suggests. The system he describes involves CLIs, API integrations, local file systems, model configuration, and loop engineering. None of that is insurmountable, but "not technical" is doing some heavy lifting here.

Third—and this is the one worth sitting with—there's a business model embedded in the video. The Agent OS setup, the tutorials, the coaching calls, and the community access all live inside a paid product called the AI Profit Boardroom. The video is, functionally, a demonstration-length sales pitch. That doesn't make the underlying ideas wrong, but it does mean the frame is optimized for persuasion, not falsifiability. You're seeing what works in a controlled demo by the person who built it for their own use case.

Goldie acknowledges some of this: "You know, you could do it the old way which is like everything scattered, different tab for AI you use or different terminal, different window, re-explain your business in every single new chat." The contrast he draws between the old way and the new is vivid and accurate as a description of common pain points. Whether his specific implementation is the right solution, or whether it's one possible solution among several, is a question the video doesn't particularly want to answer.

The architect framing

What's most durable in Goldie's pitch isn't any specific feature—it's the conceptual reframe he offers near the end: "It's about becoming a great architect, right? It's not about prompting, it's not about coding, it's not about being technical. It's about how can you design beautiful systems that work together, that run without you."

This is actually a meaningful shift in how to think about AI capability. The dominant early frame for "using AI well" was prompt engineering—crafting better inputs to get better outputs. The next frame was fine-tuning and retrieval augmentation. What Goldie is pointing at is a third frame: systems design. The value isn't in any individual agent or any individual prompt. It's in the connections between agents, the persistence of memory across sessions, and the feedback loops that improve outputs over time.

Whether you build it yourself, buy someone else's setup, or use one of the commercial platforms slowly assembling similar capabilities, the underlying question is the same: are you still the glue holding your AI tools together, or have you built something that holds itself?

Marcus Chen-Ramirez is a senior technology correspondent for Buzzrag.