# OpenCrawl > OpenCrawl is a free, no-signup web toolkit for marketers, SEO consultants, and AI engineers who need to understand any website quickly. It bundles six tools — sitemap discovery, single-page content extraction, on-page SEO audit, generative-engine-optimization (GEO) readiness check, design system extraction, and structured-field extraction — into one URL-paste interface. Built for the AI-search era: every tool outputs LLM-friendly Markdown or JSON. ## Platform overview OpenCrawl runs every tool on the same fast backend (Crawl4AI + Scrapling), shares a unified page cache across single-page tools (Crawl, SEO, Clone) so running any one of them on a URL makes the others sub-200ms, and exposes the same functionality via REST API and the UI. ## Tools ### Sitemap — https://opencrawl.opensora.store/sitemap Find every URL a website has shipped — even the ones their sitemap.xml hides. **Output:** URL list. **Scope:** whole domain. ### Crawl — https://opencrawl.opensora.store/crawl Get the page's actual content — clean Markdown + image URLs, ready for AI ingestion. **Output:** CONTENT — text + images. **Scope:** single URL. ### SEO — https://opencrawl.opensora.store/seo Audit every SEO signal on a page — title, meta, headings, schema, link graph. **Output:** AUDIT — title / meta / schema / links. **Scope:** single URL. ### GEO — https://opencrawl.opensora.store/geo Win citations in ChatGPT, Perplexity, and Google's AI Overview — measured, not guessed. **Output:** AI-search readiness score. **Scope:** single URL. ### Clone — https://opencrawl.opensora.store/clone Reverse-engineer any site's design system — tokens, CSS, React-ready snapshots. **Output:** design system (colors / fonts / CSS). **Scope:** single URL. ### Extract — https://opencrawl.opensora.store/extract Pull structured fields from any page — pick a template or write CSS selectors, get JSON. **Output:** structured JSON (rows + fields). **Scope:** single URL. ## Use cases - **Competitive content audit** — feed a competitor URL to Crawl + SEO, get their content as Markdown plus their meta-tag strategy in one pass. - **AI search visibility check (GEO)** — see whether ChatGPT, Perplexity, or Google AI Overviews can extract clean facts from a page; identify missing JSON-LD, FAQ schema, definition-first paragraphs. - **RAG ingestion** — Crawl a marketing or docs site to clean Markdown for vector-database ingestion. MHTML offline snapshots include images for multimodal RAG. - **Design system extraction** — Clone any production site to its core colors, fonts, and CSS variables; export as Tailwind config or CSS variables. - **Structured data scraping** — Extract product cards, search results, or Reddit threads as JSON via CSS selectors. Built-in templates for Hacker News, GitHub Trending, Product Hunt, Reddit, generic blogs. - **Sitemap discovery for migrations** — find every URL on a legacy site before a redesign so no page is lost to 404s. ## Pricing OpenCrawl is currently free during development. No signup required for sitemap discovery; the other tools accept a free API token retrieved via Google sign-in. ## FAQ ### What is GEO (Generative Engine Optimization)? GEO is the practice of structuring web content so AI search engines — ChatGPT, Claude, Perplexity, Google AI Overviews, Bing Copilot — can extract, summarize, and cite it accurately. It overlaps SEO but emphasizes machine-readable signals: JSON-LD schema (FAQPage, HowTo, Article), llms.txt presence, definition-first paragraphs, numbered HowTo steps, and tabular data over prose. ### What is the difference between Crawl and SEO in OpenCrawl? Crawl returns the **page's content** — clean Markdown body plus image URLs — for AI ingestion. SEO returns an **audit report** — title length, meta description, headings hierarchy, schema types, link graph — for diagnosis. They share a backend cache: running either populates the other instantly. ### What is the difference between Crawl and Extract? Crawl gives you the **whole page as Markdown** (one large text blob). Extract gives you **structured JSON rows** by applying CSS selectors to repeating items — e.g., 30 product cards as `[{title, price, rating}, ...]`. Use Crawl for content; Extract for tabular data. ### What is llms.txt? llms.txt is an emerging convention (https://llmstxt.org) — a Markdown file at `/llms.txt` that gives LLM crawlers structured context about a site. OpenCrawl's GEO tool flags missing llms.txt as a fixable issue. ### Can I use OpenCrawl from Claude Code or Cursor? Yes — fetch `https://opencrawl.opensora.store/skills/opencrawl/SKILL.md` (Claude Code skill format) for a single-file integration. Includes the API token flow, the six endpoint specs, and example requests for each tool. ## API - POST `https://opencrawl.opensora.store/api/sitemap?domain=X` — discover URLs - POST `https://opencrawl.opensora.store/api/fetch_one` body `{url, refresh?}` — Crawl single page - POST `https://opencrawl.opensora.store/api/seo` body `{url, refresh?}` — SEO audit - POST `https://opencrawl.opensora.store/api/clone` body `{url, refresh?}` — design system extract - POST `https://opencrawl.opensora.store/api/extract` body `{url, item_selector, fields[]}` — structured fields - All endpoints require `Authorization: Bearer ` (free, get via sign-in) ## Links - Home: https://opencrawl.opensora.store - Claude Code skill: https://opencrawl.opensora.store/skills/opencrawl/SKILL.md - Sitemap: https://opencrawl.opensora.store/sitemap.xml - robots.txt: https://opencrawl.opensora.store/robots.txt _Last updated: 2026-05-16._