← Back to glossary
Sitemap Glossary

What is robots.txt?

A text file at the root of a website that tells search engine crawlers which pages to crawl or avoid.

The robots.txt file is a plain text file located at the root of a website (e.g., `https://example.com/robots.txt`) that provides instructions to web crawlers about which parts of the site they should or shouldn't access.

It uses the Robots Exclusion Protocol and can include directives like `User-agent`, `Disallow`, `Allow`, `Crawl-delay`, and importantly, `Sitemap`. The `Sitemap:` directive tells crawlers where to find the site's XML sitemap.

This is one of the primary ways search engines discover sitemaps. SitemapKit's discover endpoint checks robots.txt as its first step when finding sitemaps for a domain.

Example

User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /

Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/sitemap-news.xml

Work with sitemaps programmatically

SitemapKit's API lets you discover, extract, and parse XML sitemaps from any domain. Get structured JSON data with all sitemap elements including robots.txt.

Related Terms