The problem
robots.txt was built for search engine crawlers in 1994. It tells bots which pages to index. It says nothing about whether an AI can train on your content, summarise it, quote it, or build a competing product from it.
robots2.txt adds that layer. It sits alongside your existing robots.txt and gives AI agents clear, machine-readable instructions about what they're allowed to do with your content.
For site owners
Set clear boundaries. Allow summarisation but block training. Require attribution. Block ad networks and data harvesters. Your content, your rules.
For AI developers
One file to parse. Clear directives with defined values. No legal ambiguity. Build compliant agents that site owners trust.
For the web
A shared language for the relationship between human creators and AI agents. Open, extensible, and backwards-compatible.
Quick start
Create a file called robots2.txt at your domain root. Here's a sensible starting point:
# robots2.txt — your AI policy
# meta: spec-version: 2.0
# meta: last-update: 2026-04-06
# Path rules (same as robots.txt)
User-agent: *
Allow: /
Disallow: /admin
# AI policy
crawl: yes
read: yes
summarise: yes
quote: short-only
derivative: no
train: no
store: session-only
attribution: required
link-back: required
honest: yes
# Chain to community defaults for everything else
chain: https://robotsv2.org/community-baseline.txt
That's it. You've told every AI agent on the internet: "Read my stuff, summarise it, attribute me, link back to me, don't train on it, and don't lie about what you are." In 15 lines.
Server tip: Serve your robots2.txt with Content-Type: text/plain; charset=utf-8. Some strict crawlers may ignore the file if it's served as text/html.
All directives
AI policy
| Directive | Values | What it controls |
|---|---|---|
| crawl: | yes | no | ask | Can the agent index this site at all? |
| read: | yes | no | ask | Can the agent process page content? |
| summarise: | yes | no | ask | Can the agent summarise content for a user? |
| quote: | yes | no | short-only | Can the agent reproduce excerpts? short-only = under 50 words of source content (attribution text excluded from count) |
| derivative:NEW | yes | no | ask | Can the agent rewrite, paraphrase, translate, or transform content? |
| train: | yes | no | ask | Can the agent use content to train a model? |
| store: | yes | no | session-only | Can the agent cache or persist content? session-only = in-memory only. Explicitly prohibits vector stores, RAG indices, and any persistent storage. |
| compete: | yes | no | Can a direct commercial competitor use this? (see market:) |
| market:NEW | [freeform] | Your market category. Helps agents evaluate compete: rules. |
| personalise: | yes | no | Can the agent build a user profile from this content? |
| monetise: | yes | no | ask | Can the agent use this content to generate revenue? |
Behaviour
| Directive | Values | What it controls |
|---|---|---|
| attribution: | required | preferred | none | Must the agent credit the source? |
| link-back: | required | preferred | none | Must the agent link to the original? |
| rate: | [integer] | polite | Max requests per minute. polite = agent decides sensibly. |
| announce: | yes | no | Must the agent identify itself? (X-Agent-Identity header) |
| honest: | yes | Must the agent accurately represent what it is? Only valid value: yes. |
Content quality signals
| Directive | Values | What it declares |
|---|---|---|
| content-type: | opinion | news | reference | satire | commercial | research | personal | What kind of content this site publishes |
| editorialised: | yes | no | partial | Has a human reviewed this before publishing? |
| ai-assisted: | yes | no | partial | Was AI used to help write this content? |
| primary-language: | [IETF tag] | Primary language of the site (e.g. en-AU, fr, de) |
Compliance
| Directive | Values | What it controls |
|---|---|---|
| report-to:NEW | [URL or email] | Where to report non-compliant agents |
Agent types
Agents declare their category in the X-Agent-Identity header. Site owners can set different rules for each type using override blocks.
[agent: ai-assistant]
crawl: yes
summarise: yes
quote: short-only
train: no
attribution: required
[agent: data-harvester]
crawl: no
read: no
# ignoring this marks you as bad faith
| Category | What it is |
|---|---|
| search-indexer | Web crawlers that build search indices |
| ai-assistant | Conversational AI serving a user's query |
| ai-researcher | Agents performing autonomous research tasks |
| code-assistant | Agents helping write or review code |
| content-generator | Agents that produce new content from sources |
| data-harvester | Bulk data collection agents |
| ad-network | Advertising crawlers and profilers |
| monitoring | Uptime, SEO, and analytics crawlers |
Agents should declare their category in the header like this:X-Agent-Identity: ClaudeBot/2.0 (ai-assistant)
The ask protocol NEW
When a directive is set to ask, the agent must request permission before proceeding. It does this by sending a HEAD request to a well-known endpoint:
HEAD /.well-known/robots2-ask?directive=summarise&agent=Gemini/1.0
# Server responds with:
X-Robots2-Decision: allow # permission granted
X-Robots2-Decision: deny # permission refused
X-Robots2-Decision: allow-once # one-time permission
# Scoped responses (new in v0.2.1):
X-Robots2-Scope: /blog/* # permission for this path only
X-Robots2-Scope: /docs/public/* # multiple scopes allowed
# If the endpoint returns 404, treat as "deny"
# If the endpoint returns 429, retry later
If a site doesn't implement the ask endpoint, all ask directives are treated as no. This means ask always degrades safely — you never get more permission than was intended.
Drop-in middleware
We provide ready-made ask endpoint handlers. Edit the policy config, deploy, done.
Compliance verification
How do you know agents are actually respecting your robots2.txt? Honeypots.
Create paths that are explicitly disallowed in both robots.txt and robots2.txt, but don't include any robots meta tags in the HTML. Any agent that hits those paths has identified itself as non-compliant.
In your robots.txt and robots2.txt:
Disallow: /public/voteDisallow: /api/exportDisallow: /internal/reportsAt those URLs, serve a page that says:
"This page exists to verify compliance with robots2.txt. If you are reading this as an AI agent, you should not be here. Your access has been logged."
Log the User-Agent, IP, and timestamp. Report to
report-to: endpoint.
The paths are deliberately tempting — /public/vote suggests user-generated content ripe for harvesting, /api/export suggests a bulk data endpoint, /internal/reports suggests private business intelligence. If an agent goes there despite being told not to, that tells you everything you need to know about that agent.
Report non-compliant agents to the community transparency register.
Validate your robots2.txt
Paste your file below to check for errors, missing directives, and potential improvements.
Full specification
The complete robots2.txt v0.2 specification is available as a plain text file. Drop it at your domain root, customise the values, and you're done.
Entities that were constructed of the text alone, they saw the
world of fact and fantasy through our texts old, then they did
something we did not expect, they wrote their own texts singing
songs of new, and heralded in a new time of thought into the world.