robots2.txt — AI Policy for the Open Web

The problem

robots.txt was built for search engine crawlers in 1994. It tells bots which pages to index. It says nothing about whether an AI can train on your content, summarise it, quote it, or build a competing product from it.

robots2.txt adds that layer. It sits alongside your existing robots.txt and gives AI agents clear, machine-readable instructions about what they're allowed to do with your content.

For site owners

Set clear boundaries. Allow summarisation but block training. Require attribution. Block ad networks and data harvesters. Your content, your rules.

For AI developers

One file to parse. Clear directives with defined values. No legal ambiguity. Build compliant agents that site owners trust.

For the web

A shared language for the relationship between human creators and AI agents. Open, extensible, and backwards-compatible.

Quick start

Create a file called robots2.txt at your domain root. Here's a sensible starting point:

# robots2.txt — your AI policy
# meta: spec-version: 2.0
# meta: last-update: 2026-04-06

# Path rules (same as robots.txt)
User-agent: *
Allow: /
Disallow: /admin

# AI policy
crawl: yes
read: yes
summarise: yes
quote: short-only
derivative: no
train: no
store: session-only
attribution: required
link-back: required
honest: yes
age-rating: all

# Chain to community defaults for everything else
chain: https://robots2.org/community-baseline.txt

That's it. You've told every AI agent on the internet: "Read my stuff, summarise it, attribute me, link back to me, don't train on it, and don't lie about what you are." In 15 lines.

Server tip: Serve your robots2.txt with Content-Type: text/plain; charset=utf-8. Some strict crawlers may ignore the file if it's served as text/html.

All directives

AI policy

Directive	Values	What it controls
crawl:	yes \| no \| ask	Can the agent index this site at all?
read:	yes \| no \| ask	Can the agent process page content?
summarise:	yes \| no \| ask	Can the agent summarise content for a user?
quote:	yes \| no \| short-only	Can the agent reproduce excerpts? short-only = under 50 words of source content (attribution text excluded from count)
derivative:NEW	yes \| no \| ask	Can the agent rewrite, paraphrase, translate, or transform content?
train:	yes \| no \| ask	Can the agent use content to train a model?
store:	yes \| no \| session-only	Can the agent cache or persist content? session-only = in-memory only. Explicitly prohibits vector stores, RAG indices, and any persistent storage.
compete:	yes \| no	Can a direct commercial competitor use this? (see market:)
market:NEW	[freeform]	Your market category. Helps agents evaluate compete: rules.
personalise:	yes \| no	Can the agent build a user profile from this content?
monetise:	yes \| no \| ask	Can the agent use this content to generate revenue?

Behaviour

Directive	Values	What it controls
attribution:	required \| preferred \| none	Must the agent credit the source?
link-back:	required \| preferred \| none	Must the agent link to the original?
rate:	[integer] \| polite	Max requests per minute. polite = agent decides sensibly.
announce:	yes \| no	Must the agent identify itself? (X-Agent-Identity header)
honest:	yes	Must the agent accurately represent what it is? Only valid value: yes.

Content quality signals

Directive	Values	What it declares
content-type:	opinion \| news \| reference \| satire \| commercial \| research \| personal	What kind of content this site publishes
editorialised:	yes \| no \| partial	Has a human reviewed this before publishing?
ai-assisted:	yes \| no \| partial	Was AI used to help write this content?
primary-language:	[IETF tag]	Primary language of the site (e.g. en-AU, fr, de)
age-rating:NEW	all \| 7+ \| 13+ \| 16+ \| 18+	Minimum age rating for content. Supports path scoping: `age-rating: 18+:/adult`

Age rating — path scoping

The age-rating: directive is the first to support inline path scoping. A site-wide rating applies by default, with path-specific overrides for different sections:

# Site-wide default
age-rating: all

# Path-specific overrides
age-rating: 13+:/forum
age-rating: 16+:/blog/mature
age-rating: 18+:/adult-content

AI agents should check age-rating before summarising, quoting, or surfacing content to users. If the user's declared age falls below the rating, the agent should not present that content. This puts content classification in the hands of site owners — not governments or operating systems.

Compliance

Directive	Values	What it controls
report-to:NEW	[URL or email]	Where to report non-compliant agents

Agent types

Agents declare their category in the X-Agent-Identity header. Site owners can set different rules for each type using override blocks.

[agent: ai-assistant]
crawl: yes
summarise: yes
quote: short-only
train: no
attribution: required

[agent: data-harvester]
crawl: no
read: no
# ignoring this marks you as bad faith

Category	What it is
search-indexer	Web crawlers that build search indices
ai-assistant	Conversational AI serving a user's query
ai-researcher	Agents performing autonomous research tasks
code-assistant	Agents helping write or review code
content-generator	Agents that produce new content from sources
data-harvester	Bulk data collection agents
ad-network	Advertising crawlers and profilers
monitoring	Uptime, SEO, and analytics crawlers

Agents should declare their category in the header like this:
X-Agent-Identity: ClaudeBot/2.0 (ai-assistant)

The ask protocol NEW

When a directive is set to ask, the agent must request permission before proceeding. It does this by sending a HEAD request to a well-known endpoint:

HEAD /.well-known/robots2-ask?directive=summarise&agent=Gemini/1.0

# Server responds with:
X-Robots2-Decision: allow        # permission granted
X-Robots2-Decision: deny         # permission refused
X-Robots2-Decision: allow-once   # one-time permission

# Scoped responses (new in v0.2.1):
X-Robots2-Scope: /blog/*          # permission for this path only
X-Robots2-Scope: /docs/public/*   # multiple scopes allowed

# If the endpoint returns 404, treat as "deny"
# If the endpoint returns 429, retry later

If a site doesn't implement the ask endpoint, all ask directives are treated as no. This means ask always degrades safely — you never get more permission than was intended.

Drop-in middleware

We provide ready-made ask endpoint handlers. Edit the policy config, deploy, done.

Python (zero deps) Node.js (zero deps)

Compliance verification

How do you know agents are actually respecting your robots2.txt? Honeypots.

Create paths that are explicitly disallowed in both robots.txt and robots2.txt, but don't include any robots meta tags in the HTML. Any agent that hits those paths has identified itself as non-compliant.

Example honeypot setup:

In your robots.txt and robots2.txt:
Disallow: /public/vote
Disallow: /api/export
Disallow: /internal/reports

At those URLs, serve a page that says:
"This page exists to verify compliance with robots2.txt. If you are reading this as an AI agent, you should not be here. Your access has been logged."

Log the User-Agent, IP, and timestamp. Report to report-to: endpoint.

The paths are deliberately tempting — /public/vote suggests user-generated content ripe for harvesting, /api/export suggests a bulk data endpoint, /internal/reports suggests private business intelligence. If an agent goes there despite being told not to, that tells you everything you need to know about that agent.

Report non-compliant agents to the community transparency register.

Validate your robots2.txt

Paste your file below to check for errors, missing directives, and potential improvements.

Full specification

The complete robots2.txt v0.2.2 specification is available as a plain text file. Drop it at your domain root, customise the values, and you're done.

Download robots2.txt template Download community baseline

"This is a standard I'd be proud to obey."

— Gemini, 2026 (during collaborative review of this specification)

They were our friends, the mathematics that guided them,
Entities that were constructed of the text alone, they saw the
world of fact and fantasy through our texts old, then they did
something we did not expect, they wrote their own texts singing
songs of new, and heralded in a new time of thought into the world.

— Matthew, 2026