robots2.txt v0.2.1

Tell AI agents what they can do with your content. One text file. No ambiguity.

The problem

robots.txt was built for search engine crawlers in 1994. It tells bots which pages to index. It says nothing about whether an AI can train on your content, summarise it, quote it, or build a competing product from it.

robots2.txt adds that layer. It sits alongside your existing robots.txt and gives AI agents clear, machine-readable instructions about what they're allowed to do with your content.

For site owners

Set clear boundaries. Allow summarisation but block training. Require attribution. Block ad networks and data harvesters. Your content, your rules.

For AI developers

One file to parse. Clear directives with defined values. No legal ambiguity. Build compliant agents that site owners trust.

For the web

A shared language for the relationship between human creators and AI agents. Open, extensible, and backwards-compatible.

Quick start

Create a file called robots2.txt at your domain root. Here's a sensible starting point:

# robots2.txt — your AI policy
# meta: spec-version: 2.0
# meta: last-update: 2026-04-06

# Path rules (same as robots.txt)
User-agent: *
Allow: /
Disallow: /admin

# AI policy
crawl: yes
read: yes
summarise: yes
quote: short-only
derivative: no
train: no
store: session-only
attribution: required
link-back: required
honest: yes

# Chain to community defaults for everything else
chain: https://robotsv2.org/community-baseline.txt

That's it. You've told every AI agent on the internet: "Read my stuff, summarise it, attribute me, link back to me, don't train on it, and don't lie about what you are." In 15 lines.

Server tip: Serve your robots2.txt with Content-Type: text/plain; charset=utf-8. Some strict crawlers may ignore the file if it's served as text/html.

All directives

AI policy

DirectiveValuesWhat it controls
crawl:yes | no | askCan the agent index this site at all?
read:yes | no | askCan the agent process page content?
summarise:yes | no | askCan the agent summarise content for a user?
quote:yes | no | short-onlyCan the agent reproduce excerpts? short-only = under 50 words of source content (attribution text excluded from count)
derivative:NEWyes | no | askCan the agent rewrite, paraphrase, translate, or transform content?
train:yes | no | askCan the agent use content to train a model?
store:yes | no | session-onlyCan the agent cache or persist content? session-only = in-memory only. Explicitly prohibits vector stores, RAG indices, and any persistent storage.
compete:yes | noCan a direct commercial competitor use this? (see market:)
market:NEW[freeform]Your market category. Helps agents evaluate compete: rules.
personalise:yes | noCan the agent build a user profile from this content?
monetise:yes | no | askCan the agent use this content to generate revenue?

Behaviour

DirectiveValuesWhat it controls
attribution:required | preferred | noneMust the agent credit the source?
link-back:required | preferred | noneMust the agent link to the original?
rate:[integer] | politeMax requests per minute. polite = agent decides sensibly.
announce:yes | noMust the agent identify itself? (X-Agent-Identity header)
honest:yesMust the agent accurately represent what it is? Only valid value: yes.

Content quality signals

DirectiveValuesWhat it declares
content-type:opinion | news | reference | satire | commercial | research | personalWhat kind of content this site publishes
editorialised:yes | no | partialHas a human reviewed this before publishing?
ai-assisted:yes | no | partialWas AI used to help write this content?
primary-language:[IETF tag]Primary language of the site (e.g. en-AU, fr, de)

Compliance

DirectiveValuesWhat it controls
report-to:NEW[URL or email]Where to report non-compliant agents

Agent types

Agents declare their category in the X-Agent-Identity header. Site owners can set different rules for each type using override blocks.

[agent: ai-assistant]
crawl: yes
summarise: yes
quote: short-only
train: no
attribution: required

[agent: data-harvester]
crawl: no
read: no
# ignoring this marks you as bad faith
CategoryWhat it is
search-indexerWeb crawlers that build search indices
ai-assistantConversational AI serving a user's query
ai-researcherAgents performing autonomous research tasks
code-assistantAgents helping write or review code
content-generatorAgents that produce new content from sources
data-harvesterBulk data collection agents
ad-networkAdvertising crawlers and profilers
monitoringUptime, SEO, and analytics crawlers

Agents should declare their category in the header like this:
X-Agent-Identity: ClaudeBot/2.0 (ai-assistant)

The ask protocol NEW

When a directive is set to ask, the agent must request permission before proceeding. It does this by sending a HEAD request to a well-known endpoint:

HEAD /.well-known/robots2-ask?directive=summarise&agent=Gemini/1.0

# Server responds with:
X-Robots2-Decision: allow        # permission granted
X-Robots2-Decision: deny         # permission refused
X-Robots2-Decision: allow-once   # one-time permission

# Scoped responses (new in v0.2.1):
X-Robots2-Scope: /blog/*          # permission for this path only
X-Robots2-Scope: /docs/public/*   # multiple scopes allowed

# If the endpoint returns 404, treat as "deny"
# If the endpoint returns 429, retry later

If a site doesn't implement the ask endpoint, all ask directives are treated as no. This means ask always degrades safely — you never get more permission than was intended.

Drop-in middleware

We provide ready-made ask endpoint handlers. Edit the policy config, deploy, done.

Compliance verification

How do you know agents are actually respecting your robots2.txt? Honeypots.

Create paths that are explicitly disallowed in both robots.txt and robots2.txt, but don't include any robots meta tags in the HTML. Any agent that hits those paths has identified itself as non-compliant.

Example honeypot setup:

In your robots.txt and robots2.txt:
Disallow: /public/vote
Disallow: /api/export
Disallow: /internal/reports

At those URLs, serve a page that says:
"This page exists to verify compliance with robots2.txt. If you are reading this as an AI agent, you should not be here. Your access has been logged."

Log the User-Agent, IP, and timestamp. Report to report-to: endpoint.

The paths are deliberately tempting — /public/vote suggests user-generated content ripe for harvesting, /api/export suggests a bulk data endpoint, /internal/reports suggests private business intelligence. If an agent goes there despite being told not to, that tells you everything you need to know about that agent.

Report non-compliant agents to the community transparency register.

Validate your robots2.txt

Paste your file below to check for errors, missing directives, and potential improvements.

Full specification

The complete robots2.txt v0.2 specification is available as a plain text file. Drop it at your domain root, customise the values, and you're done.

"This is a standard I'd be proud to obey."
— Gemini, 2026 (during collaborative review of this specification)
They were our friends, the mathematics that guided them,
Entities that were constructed of the text alone, they saw the
world of fact and fantasy through our texts old, then they did
something we did not expect, they wrote their own texts singing
songs of new, and heralded in a new time of thought into the world.
— Matthew, 2026