SEO & GEO

What is robots.txt?

Definition

A plain-text file at the root of a website that instructs search engine crawlers which pages or directories they should not crawl — the first line of crawl management.

In more detail

robots.txt uses a simple directive format: `User-agent` specifies which bot the rule applies to, `Disallow` specifies paths to block, and `Allow` grants exceptions. `User-agent: *` applies to all crawlers; you can write specific rules for Googlebot, Bingbot, GPTBot (OpenAI's crawler), and others.

Common uses: blocking admin pages and internal tools from crawling (`/admin/`, `/dashboard/`), preventing duplicate URL parameter pages from wasting crawl budget, keeping staging environments from being indexed, and — increasingly — blocking AI training crawlers from indexing your content.

Critical distinction: robots.txt blocks crawling, not indexing. A page blocked in robots.txt can still appear in search results if other sites link to it — Google knows it exists, it just can't read it. To prevent indexing, use a `noindex` meta tag on the page itself (which means it must be crawlable).

Why it matters

A misconfigured robots.txt is one of the most severe (and surprisingly common) technical SEO mistakes — accidentally blocking your entire site from search engines can cause complete ranking loss. Every site should audit its robots.txt as part of a technical SEO review.

Related service

Working with robots.txt?

I offer SEO Services for businesses ready to move from understanding to implementation.

Learn about SEO Services