What is robots.txt?

The robots.txt file is a plain text file located at the root of a website (e.g., `https://example.com/robots.txt`) that provides instructions to web crawlers about which parts of the site they should or shouldn't access.

It uses the Robots Exclusion Protocol and can include directives like `User-agent`, `Disallow`, `Allow`, `Crawl-delay`, and importantly, `Sitemap`. The `Sitemap:` directive tells crawlers where to find the site's XML sitemap.

This is one of the primary ways search engines discover sitemaps. SitemapKit's discover endpoint checks robots.txt as its first step when finding sitemaps for a domain.

Example

Work with sitemaps programmatically

Related Terms