Thariq Shihipar, an engineer on the Claude Code team, just published their internal playbook on Skills — how they categorize them, what makes them work, and when to share them.
The short version: Skills are more powerful than most people realize, and most people are using them wrong.
Skills Are Folders, Not Files
The most common misconception: a Skill is just a Markdown file with some instructions.
It's not. A Skill is a directory. Inside that directory, you can have scripts, data files, templates, configuration, examples — anything the agent might need to do its job. The SKILL.md file is just the entry point.
This distinction matters enormously. A Skill that only has a prompt is a hint. A Skill that includes helper scripts, example inputs/outputs, and supporting data is a system. One is fragile. The other actually works at 2am when you're not watching.
The 9 Types Anthropic Uses
They break their internal Skills into roughly nine categories:
- Code standards — enforce your team's specific conventions (linting rules, naming patterns, commit format)
- Project context — architecture decisions, why things were built the way they were
- Tool wrappers — specialized knowledge for a specific library or internal API
- Workflow pipelines — multi-step processes codified into repeatable sequences
- Documentation generators — templates and patterns for writing consistent docs
- Review checklists — quality gates that run before output is accepted
- Environment setup — onboarding scripts, dependency installation, config scaffolding
- Debug playbooks — known failure modes and how to fix them
- Domain knowledge — business logic, industry terminology, product-specific context
Project Skills Beat Global Skills
One clear finding: project-level Skills consistently outperform global Skills.
A global Skill for "writing tests" is okay. A project Skill that knows your specific test framework, your mock patterns, your fixture conventions, and has examples from your actual codebase — that's 3-5x more effective.
The closer a Skill is to the actual work, the better it performs.
The Right Way to Test a Skill
Most people write a Skill, run it a few times, see decent output, and ship it. Then they're confused when it fails 30% of the time.
The right approach: define a quality standard before writing the Skill. What does "good output" look like? What does "bad output" look like? Write those criteria down, then run the Skill against them — repeatedly, automatically, with the agent itself doing the evaluation.
One method that works well: write a test harness that runs your Skill overnight with 100+ variations, scores each output, and feeds failures back to the Skill for self-improvement. Anthropic teams have taken Skills from 56% pass rate to 92% pass rate using this approach.
When to Share vs. Keep Private
Skills that encode your specific institutional knowledge — your architecture, your conventions, your domain — are private assets. Don't publish those.
Skills that solve generic, widely-shared problems are worth making public. The community benefits, and you get feedback that makes them better.
A good rule: if someone at a completely different company could use this Skill without modification, consider open-sourcing it.
30+ agent tools (Claude Code, Gemini CLI, Cursor, and others) have now standardized on the same SKILL.md format. The format war is over. The question now is whether your Skills are actually good — and that's a craft problem, not a specification problem.