Anthropic Uses Hundreds of Skills Internally. Here's Their Playbook.

Thariq Shihipar, an engineer on the Claude Code team, just published their internal playbook on Skills — how they categorize them, what makes them work, and when to share them.

The short version: Skills are more powerful than most people realize, and most people are using them wrong.

Skills Are Folders, Not Files

The most common misconception: a Skill is just a Markdown file with some instructions.

It's not. A Skill is a directory. Inside that directory, you can have scripts, data files, templates, configuration, examples — anything the agent might need to do its job. The SKILL.md file is just the entry point.

This distinction matters enormously. A Skill that only has a prompt is a hint. A Skill that includes helper scripts, example inputs/outputs, and supporting data is a system. One is fragile. The other actually works at 2am when you're not watching.

The 9 Types Anthropic Uses

They break their internal Skills into roughly nine categories:

Code standards — enforce your team's specific conventions (linting rules, naming patterns, commit format)
Project context — architecture decisions, why things were built the way they were
Tool wrappers — specialized knowledge for a specific library or internal API
Workflow pipelines — multi-step processes codified into repeatable sequences
Documentation generators — templates and patterns for writing consistent docs
Review checklists — quality gates that run before output is accepted
Environment setup — onboarding scripts, dependency installation, config scaffolding
Debug playbooks — known failure modes and how to fix them
Domain knowledge — business logic, industry terminology, product-specific context

Project Skills Beat Global Skills

One clear finding: project-level Skills consistently outperform global Skills.

A global Skill for "writing tests" is okay. A project Skill that knows your specific test framework, your mock patterns, your fixture conventions, and has examples from your actual codebase — that's 3-5x more effective.

The closer a Skill is to the actual work, the better it performs.

The Right Way to Test a Skill

Most people write a Skill, run it a few times, see decent output, and ship it. Then they're confused when it fails 30% of the time.

The right approach: define a quality standard before writing the Skill. What does "good output" look like? What does "bad output" look like? Write those criteria down, then run the Skill against them — repeatedly, automatically, with the agent itself doing the evaluation.

One method that works well: write a test harness that runs your Skill overnight with 100+ variations, scores each output, and feeds failures back to the Skill for self-improvement. Anthropic teams have taken Skills from 56% pass rate to 92% pass rate using this approach.

When to Share vs. Keep Private

Skills that encode your specific institutional knowledge — your architecture, your conventions, your domain — are private assets. Don't publish those.

Skills that solve generic, widely-shared problems are worth making public. The community benefits, and you get feedback that makes them better.

A good rule: if someone at a completely different company could use this Skill without modification, consider open-sourcing it.

30+ agent tools (Claude Code, Gemini CLI, Cursor, and others) have now standardized on the same SKILL.md format. The format war is over. The question now is whether your Skills are actually good — and that's a craft problem, not a specification problem.