SEO & GEO
What is robots.txt?
Definition
A plain-text file at the root of a website that instructs search engine crawlers which pages or directories they should not crawl — the first line of crawl management.
In more detail
robots.txt uses a simple directive format: `User-agent` specifies which bot the rule applies to, `Disallow` specifies paths to block, and `Allow` grants exceptions. `User-agent: *` applies to all crawlers; you can write specific rules for Googlebot, Bingbot, GPTBot (OpenAI's crawler), and others.
Common uses: blocking admin pages and internal tools from crawling (`/admin/`, `/dashboard/`), preventing duplicate URL parameter pages from wasting crawl budget, keeping staging environments from being indexed, and — increasingly — blocking AI training crawlers from indexing your content.
Critical distinction: robots.txt blocks crawling, not indexing. A page blocked in robots.txt can still appear in search results if other sites link to it — Google knows it exists, it just can't read it. To prevent indexing, use a `noindex` meta tag on the page itself (which means it must be crawlable).
Why it matters
A misconfigured robots.txt is one of the most severe (and surprisingly common) technical SEO mistakes — accidentally blocking your entire site from search engines can cause complete ranking loss. Every site should audit its robots.txt as part of a technical SEO review.
Further reading
Related service
Working with robots.txt?
I offer SEO Services for businesses ready to move from understanding to implementation.
Learn about SEO Services →