Use Case

Sitemap API for LLM Agents & RAG Pipelines

Feed URL lists to AI agents and RAG pipelines

LLM agents need to discover all pages on a domain before crawling them for content. SitemapKit provides a clean URL list that agents can use as seed URLs for Crawl4AI, Firecrawl, or custom crawlers.

Why use SitemapKit?

Get all URLs on a domain without expensive full-site crawling
Perfect seed URLs for RAG indexing pipelines
Works with LangChain, LlamaIndex, CrewAI, and custom agents
llms.txt compatible — agents can discover the API automatically

Example

import requests

# Step 1: Get all URLs from sitemap
response = requests.post(
    "https://sitemapkit.com/api/v1/sitemap/full",
    headers={"x-api-key": "YOUR_API_KEY"},
    json={"url": "docs.example.com"}
)
urls = [u["loc"] for u in response.json()["urls"]]

# Step 2: Feed to your RAG pipeline
for url in urls:
    content = crawl_page(url)  # Crawl4AI, Firecrawl, etc.
    index.add(content)         # Add to vector store

Start using SitemapKit for llm agents & rag pipelines

Free tier includes 100 API calls/month. No credit card required.

Get Free API Key API Documentation

Other use cases

Web Scraping

Replace full-site crawls with sitemap extraction

Content Discovery

Discover all published content on any domain

Competitive Intelligence

Monitor competitor content and site structure

SEO Audit

Automate sitemap audits at scale