AI Agents Learn Procedural Knowledge Through

Large language models can recite the history of SQL and explain Kubernetes architecture. They know facts. What they don't know is the 47-step workflow for generating a compliant financial report, or the specific sequence your company uses to onboard a new client, or the judgment calls that separate a competent employee from someone who just looked up the manual.

That gap between knowing things and knowing how to do things has created a market for something called agent skills. IBM Technology's Martin Keen recently walked through how they work, and the explanation is worth examining because skills represent a specific bet about how AI agents will become useful in practice.

The Problem Skills Address

The issue is straightforward. An AI agent running a large language model has two bad options when it encounters a task requiring specific procedural knowledge. Either someone needs to prompt it with every single step—every time—or the agent guesses. Neither scales.

"An AI agent that is running a large language model when it encounters a task like generating this report, it basically has two options," Keen explains. "Either somebody needs to prompt it with every single step, all 47 of them, and they need to do that every time, or worse still, the agent is just going to take a guess at it."

Skills are the proposed solution. They're markdown files that contain instructions, formatted in a way that agents can load selectively. The architecture is simple enough to describe in a paragraph: YAML front matter with a name and description, a body section with the actual instructions, and optional folders for scripts, references, and assets.

The simplicity is deliberate. The skill.md format is an open standard published at agentskills.io under an Apache 2.0 license, adopted by Claude, OpenAI Codex, and other platforms. A skill built for one works on another.

Progressive Disclosure and the Token Budget

The clever part is how skills load. If an agent has hundreds of skills available, loading them all at startup would exhaust the context window before any real work began. Skills use what's called progressive disclosure across three tiers.

Tier one loads only metadata—the name and description from each skill. Even with 100 skills installed, that's a handful of tokens per skill. The agent maintains what amounts to a table of contents of everything it can do.

Tier two loads the full instructions when the agent encounters a request matching a skill's description. The LLM's own reasoning handles this matching, which is why the description field matters. Write "use this when the user asks to extract a PDF" and the model decides when that applies.

Tier three loads resources—scripts, references, assets—only when a specific task needs them.

This architecture separates knowing about a capability from knowing how to execute it from having the resources to do so. It's efficient, but it also creates dependency on the LLM's judgment about when to invoke which skill. That judgment varies by model, by prompt, by the specific wording of a request.

How Skills Relate to Other Knowledge Types

Keen draws a useful distinction between knowledge types. Model Context Protocol (MCP) provides tool access—the ability to call external APIs. Retrieval-augmented generation (RAG) handles factual knowledge by pulling relevant chunks from databases. Fine-tuning bakes knowledge directly into model weights, which is permanent but expensive and model-specific.

Skills handle procedural knowledge. "Skills handle, as I mentioned right up front, procedural knowledge," Keen notes. "It's how to do things in what order and with what judgment."

The cognitive science analogy is semantic memory (facts), episodic memory (experiences), and procedural memory (skills). Agent architectures increasingly mirror this structure. RAG covers semantic memory. Conversational logs cover episodic memory. Skills cover procedural memory.

In practice, these knowledge types combine. MCP provides the capability to invoke something externally; the skill provides judgment about when and how to do it.

The Security Problem

Skills can include executable scripts with access to file systems, environment variables, and API keys. That's what makes them powerful. That's also what makes them dangerous.

Keen is direct about this: "Audits have found publicly available skills frequently contain bad stuff like prompt injection, bad stuff like tool poisoning, bad stuff like hidden malware. Basically, the usual suspects for any open ecosystem."

The advice is sensible—treat skill installation like any software dependency, which means reviewing and understanding what it does before running it. But that advice assumes technical competence and time that many users won't have. The open ecosystem that makes skills portable and reusable also makes them a vector for the same attacks that plague any package repository.

This isn't unique to skills. It's the fundamental tension in any system that combines ease of use with executable code. The markdown format makes skills accessible. The executable scripts make them capable. Security requires users to exercise judgment about what they install, which works until it doesn't.

What This Means for Agent Architecture

The emergence of agent skills as a standard suggests where the industry thinks AI agents need to go. Not toward models that know more facts—that's covered. Not toward models with longer context windows, though that helps. Toward models that can follow organizational procedures, adapt to specific workflows, and make the judgment calls that constitute actual work.

Whether skills are the right abstraction for procedural knowledge remains an open question. The format is simple, which is an advantage until you encounter something complex that doesn't map cleanly to markdown and optional scripts. The standard is open, which encourages adoption until competing standards emerge or the major platforms decide they'd rather control the ecosystem.

What's clear is that the gap between "knows things" and "knows how to do things" matters more as AI agents move from demos to deployment. Skills are one attempt to bridge it. They won't be the last.

Bob Reynolds is Senior Technology Correspondent for Buzzrag.