Feed sitemap URLs into LlamaIndex for building knowledge bases. Discover all pages on a domain, then index them for RAG-based question answering.
import requests
from llama_index.core import VectorStoreIndex
from llama_index.readers.web import SimpleWebPageReader
# Get all URLs from sitemap
resp = requests.post(
"https://sitemapkit.com/api/v1/sitemap/full",
headers={"x-api-key": "YOUR_API_KEY", "Content-Type": "application/json"},
json={"url": "docs.example.com"}
)
urls = [u["loc"] for u in resp.json()["urls"]]
# Load and index pages
documents = SimpleWebPageReader(html_to_text=True).load_data(urls[:100])
index = VectorStoreIndex.from_documents(documents)
# Query the index
query_engine = index.as_query_engine()
response = query_engine.query("How do I configure authentication?")
print(response)sk_live_* API key./api/v1/sitemap/full endpoint to discover and extract all sitemaps from a domain in one call.POST /api/v1/sitemap/discover — Find all sitemaps on a domainPOST /api/v1/sitemap/extract — Parse a sitemap URL and extract all URLsPOST /api/v1/sitemap/full — Discover + extract in one call (recommended)100 free API calls/month. No credit card required.