SitemapKit + LangChain

Quick Start

from langchain.tools import tool
from langchain_community.document_loaders import WebBaseLoader
import requests

@tool
def get_sitemap_urls(domain: str) -> list[str]:
    """Get all URLs from a domain's sitemap using SitemapKit API."""
    resp = requests.post(
        "https://sitemapkit.com/api/v1/sitemap/full",
        headers={"x-api-key": "YOUR_API_KEY", "Content-Type": "application/json"},
        json={"url": domain}
    )
    return [u["loc"] for u in resp.json()["urls"]]

# Use in a RAG pipeline
urls = get_sitemap_urls.invoke("docs.example.com")
loader = WebBaseLoader(urls[:50])  # Load first 50 pages
docs = loader.load()

# Now index docs into your vector store
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

vectorstore = FAISS.from_documents(docs, OpenAIEmbeddings())

How it works

Get your API key — Sign up for a free SitemapKit account to get your sk_live_* API key.
Call the API — Use the /api/v1/sitemap/full endpoint to discover and extract all sitemaps from a domain in one call.
Process the data — The response includes structured JSON with all URLs, lastmod dates, and sitemap metadata.

API Endpoints

POST /api/v1/sitemap/discover — Find all sitemaps on a domain
POST /api/v1/sitemap/extract — Parse a sitemap URL and extract all URLs
POST /api/v1/sitemap/full — Discover + extract in one call (recommended)

Quick Start

How it works

API Endpoints

Start using SitemapKit with LangChain

More AI Frameworks integrations