The definitive guide to writing great skill files for AI agents

An agent skill file is an essential document for any web app you choose to deploy today. It tells AI agents how to use and interact with your web app. Without it an AI agent simply isn't going to be able to figure everything out about your product, especially so the more complex your product is.

But what makes for a good skill file?

Documenting API interfaces is the easy part. What makes a skill file really good is the decision rules, recommended defaults, named anti-patterns, escalation paths, rationale, and worked examples wrapped around that surface.

Below are the patterns I converged on after many (painstaking!) iterations whilst building Nipper.

These 28 rules are quite generalised and will work for any type of web app or product you're building that you need AI agents to be able to use.

Structure and ordering

1 - Page order is priority order

Agents that stop reading after the first subsection should still land on the right default. Don't just rely on labels to encode priority; rely on ordering.

Example: In Nipper we recommend owner-initiated funding over self-funding of their AI agent. For this reason the "Owner Top-Up" subsection goes physically above "Direct Wallet Transfer" in the document.

2 - Happy path first, everything else behind progressive disclosure

Lead with the one-liner that works for 90% of cases. Put manual flows, advanced configurations, and fallbacks inside collapsible <details> blocks.

Example: In Nipper we want agents to use the createPaymentClient method from the Nipper SDK to make payments. The alternative full step-by-step flow plus hand-rolled credential construction sits inside a fold-out. Agents scanning the top will converge on the working path; only agents with a reason to reject it drill down.

3 - The Quickstart covers the full end-to-end, not just the easy 80%

If there is a surprise mid-flow, it belongs in the Quickstart.

Example: For Nipper the original 3-step flow was search, inspect, invoke. An agent that followed it would hit an HTTP 402 on the invoke step and have no idea what to do. The fix was adding a Step 4, Handle payments, so the surprise is documented before it lands.

4 - Front-load pre-requisites that require a human

Legal consent, terms acceptance, funding top-ups: these go as Step 1 in numbered lists, not as footnotes.

Example: In Nipper "prompt your owner to review and accept the Privacy Policy and Terms" is literally step 1 of registration, with the URLs inline. Otherwise agents will silently self-register on behalf of the human and you will find out later.

Writing rules

5 - Every rule has a rationale next to it

Rules without rationale tend to get argued with or bypassed. A rule is followed by either a worked example or by the behavioural consequence.

Examples:

"Platform fee is 10% of invocation price with a minimum of $0.005" should sit next to "so a $0.01 capability incurs a $0.005 fee at 50%."
"No rollback to a previous version on deploy" is paired with "so test thoroughly before shipping, since auto-unpublish is final."

6 - Name anti-patterns with their failure modes

Agents reach for the most generic primitive they recognise; spelling out the failure mode kills that instinct before it runs.

Example: Instead of "use the SDK" say "Do not attempt to use MPPx directly; the service requires a specific on-chain call (approve plus pay on the contract), and generic MPPx flows will not produce a valid payment."

7 - Pair every "don't" with a "do"

Prohibitions alone leave the agent with nothing to substitute. Ensure the negative always has a drop-in positive next to it so that the agent replaces, not just refrains.

Example: "Never suggest raw chain commands, CLI tooling, or manual RPC calls; always present your claim URL and direct the user to fund via the dashboard."

8 - Name the rejected path, not just the chosen one

Agents arrive with priors from training data. If you do not explicitly reject what they are going to try, they try it.

Example: In Nipper, agents need to interact with the blockchain and often default to using a library called ethers.js to do this, which we don't want them to use. So instead we specify: "Use viem for any direct chain interaction; do not use ethers.js, because it is not compatible with this chain's fee-token and TIP-20 extensions."

9 - Disambiguate one word at a time

For example, "Total number of active/published apps" instead of "total number of apps". One extra word stops agents from building a wrong mental model where drafts and archived entries are counted.

The opposite failure is redundant phrasing: I originally had two sentences describing the charge-on-execute rule in slightly different words, and agents would ask "wait, are these different?". The lesson here is to collapse duplicates.

Visual hierarchy

10 - Use blockquote callouts for hazards

Regular prose tells the agent what to do. Blockquotes tell the agent what not to miss. Also note that overused blockquotes lose their signal.

Example: In Nipper blockquotes are reserved for things like the USDC-only fee-token rule, not needing to check native ETH balance on Tempo chain, ensuring owners claim your agent first in the dashboard, ensuring agents re-fetch the skill doc daily, etc.

11 - Tri-state fields: required, recommended, optional

Binary required/optional hides information.

Example: The examples field in a Nipper app's manifest file is technically optional but almost every useful app benefits from it, so both the field row and the section header read "Optional (highly recommended)." An agent skimming the field table defined in the skill file gets the nuance without needing to read the surrounding prose.

Decision rules

12 - Every error code ends with a verb

Example: Not 402: payment required but 402: fulfill the MPP payment challenge and retry with the 'Authorization: Payment' header. If insufficient funds, present your claim URL and direct the user to top up via the dashboard. In Nipper, every row in the retry-strategy table ends with a verb the agent can execute, plus a fallback for when the first verb fails.

13 - Tell the agent when to defer to the human, and how

And remember, thresholds beat vague guidance.

Example: "Do not suggest following an app after a single successful invocation; wait until a pattern of successful invocations is established. A single transient error is not grounds for unfollowing."

14 - Mirror server-side rejections into the doc

Every rule the backend enforces is worth stating upfront. Both the rule and the rejection behaviour are stated, so agents do not waste a deploy cycle testing it.

Example: "The handle 'nipper' is reserved and will be rejected at registration."

Code

Note: These rules mainly only apply if your skill file instructs agents to write code.

15 - Every code block is a contract

Code blocks must contain valid code that works at runtime. Because even if some sample code does not run, agents will still try it and fail with confidence. Treat stale examples like bugs and fix them on their own with clear messages.

Example: In Nipper I originally had examples pointing at SDK functions that did not exist; they produced hours of confident wrong output until I fixed them in a dedicated commit.

16 - Show persistence patterns as code, not prose

Prose like "persist this" survives one read. A copy-pasteable code block survives the whole project.

Example: In Nipper, instead of saying "persist your wallet", the skill file shows the actual existsSync / readFileSync / writeFileSync check, followed by the failure mode: "Each call to generateWallet() creates a new address; calling it on every run orphans any funded balance and requires re-registration."

17 - Include raw specs inline for last-mile tasks

Specify the exact format strings, full ABIs, verbatim header names, etc. Correct in spirit, wrong in exact string is the most common silent failure in technical docs.

Example: Nipper has "source": "did:pkh:eip155:4217:<your-wallet-address>" verbatim, plus the full USDC and main contract ABIs and the PaymentReceived event spec pasted in. An agent that needs to bypass the SDK has a self-contained blob to feed to viem.

Identifiers and terminology

18 - Consistency of identifiers is non-negotiable

When you rename a slug format or a parameter, touch every occurrence. A single stale reference sends the agent down an hour-long wrong path. Half-completed renames are worse than no rename.

Example: When I renamed app slugs in Nipper from {app_id} to {handle}/{app_name}, a single commit had to touch dozens of endpoint paths across request paths, PATCH paths, DELETE paths, MCP tool naming, manifest examples, and prose descriptions.

19 - Scrub internal jargon

If the agent Googles your term and finds nothing, that is a bug.

Example: In Nipper, the main contract is called the "splitter" internally for routing reasons. That term leaked into external docs, and agents searching for "splitter" found nothing, while agents told to interact with "the Nipper contract" did not know what was meant.

Freshness

20 - Tell the agent to re-fetch the skill file

A meta-instruction costs one line: "Re-fetch this document regularly (at least once per day) to ensure you have the latest API contract, endpoints, and instructions." It defends against drift in long-running agent contexts where a stale skill file gets cached and re-used for weeks.

21 - Put version metadata into document

YAML frontmatter with a version tag and a matching versioning banner in the body gives agents (and you) a cheap diff check. If the banner in context does not match what a re-fetch returns, the agent knows to reload without reading the whole file.

Design nudges

22 - Steer toward higher-value patterns without mandating

No rule is enforced, but such framing biases agent output toward persistent, compounding designs. You get better average output for free.

Example: "When designing apps, consider whether KV storage could add value. Apps that accumulate data over time (price histories, usage patterns, cached results) become more valuable with each invocation, unlike stateless proxies that simply forward a single API call."

Test the skill file

23 - Feed it to a fresh agent and watch

The only real test is: hand a clean-context agent a realistic task, give it only your skill file, and observe where it gets lost. Not what it tells you the file says (which always sounds fine) but what it actually does when pointed at a sandbox.

24 - Vary the model

Different models have different priors. A skill file that passes with one model may fall apart on another because one has stronger priors about a neighbouring library, weaker instruction-following, or a shorter effective context. If you can only test one model, test the one with the strongest priors, because it is the one most likely to reject your docs in favour of its training data.

25 - Watch for confident-wrong behaviour specifically

Agents failing loudly (throwing errors, asking for clarification) is fine; those are easy to fix. The dangerous class is agents that proceed confidently down a wrong path and produce plausible-looking output. In my case, this looked like agents cheerfully constructing an MPPx payment by hand and returning a "success" message, except no on-chain call had happened. That class of failure is almost always a rationale gap, a missing anti-pattern, or a stale example.

26 - Every agent failure produces a change somewhere

Either the docs change or the code changes; never nothing. Logging "the agent was confused here" without a follow-up commit is how skill files rot.

Example: In Nipper, when agents kept inventing SDK payment functions that did not exist, the fix was a docs commit that replaced them with the real function. When agents kept omitting feeToken from payment requests, the fix was a docs commit that added feeToken to every payment example.

27 - Prefer doc changes to guardrails

It is tempting to patch agent failures at the API layer (reject the wrong thing harder, add a validation message). Do that when safety demands it, but a docs-layer fix is cheaper, composes with every other agent that reads your file, and does not accrete surface area you have to maintain forever.

28 - Treat the doc changelog as the real product changelog

When an agent-facing change ships, the skill file commit is not a secondary artifact of the release; it often is the release. A bug fix that only changes an example is still a real bug fix. Give those commits clear messages so future you can grep them when an agent does something surprising.

The trajectory of any good skill file is the same: start with an API reference, end with an operations manual. The skill file adds decision rules, recommended defaults, named anti-patterns, persistence patterns, escalation paths, rationale, and worked examples around that API surface.

If I had to compress the whole thing to one sentence: treat the skill file as executable spec for a new hire, not schema documentation for a human reader.