# OpenCrawl

> OpenCrawl is a free, no-signup web toolkit for marketers, SEO consultants, and AI engineers who need to understand any website quickly. It bundles six tools — sitemap discovery, single-page content extraction, on-page SEO audit, generative-engine-optimization (GEO) readiness check, design system extraction, and structured-field extraction — into one URL-paste interface. Built for the AI-search era: every tool outputs LLM-friendly Markdown or JSON.

## Platform overview

OpenCrawl runs every tool on the same fast backend (Crawl4AI + Scrapling),
shares a unified page cache across single-page tools (Crawl, SEO, Clone) so
running any one of them on a URL makes the others sub-200ms, and exposes
the same functionality via REST API and the UI.

## Tools

### Sitemap — https://opencrawl.opensora.store/sitemap

Find every URL a website has shipped — even the ones their sitemap.xml hides.

**Output:** URL list.  
**Scope:** whole domain.

### Crawl — https://opencrawl.opensora.store/crawl

Get the page's actual content — clean Markdown + image URLs, ready for AI ingestion.

**Output:** CONTENT — text + images.  
**Scope:** single URL.

### SEO — https://opencrawl.opensora.store/seo

Audit every SEO signal on a page — title, meta, headings, schema, link graph.

**Output:** AUDIT — title / meta / schema / links.  
**Scope:** single URL.

### GEO — https://opencrawl.opensora.store/geo

Win citations in ChatGPT, Perplexity, and Google's AI Overview — measured, not guessed.

**Output:** AI-search readiness score.  
**Scope:** single URL.

### Clone — https://opencrawl.opensora.store/clone

Reverse-engineer any site's design system — tokens, CSS, React-ready snapshots.

**Output:** design system (colors / fonts / CSS).  
**Scope:** single URL.

### Extract — https://opencrawl.opensora.store/extract

Pull structured fields from any page — pick a template or write CSS selectors, get JSON.

**Output:** structured JSON (rows + fields).  
**Scope:** single URL.

## Use cases

- **Competitive content audit** — feed a competitor URL to Crawl + SEO, get their content as Markdown plus their meta-tag strategy in one pass.
- **AI search visibility check (GEO)** — see whether ChatGPT, Perplexity, or Google AI Overviews can extract clean facts from a page; identify missing JSON-LD, FAQ schema, definition-first paragraphs.
- **RAG ingestion** — Crawl a marketing or docs site to clean Markdown for vector-database ingestion. MHTML offline snapshots include images for multimodal RAG.
- **Design system extraction** — Clone any production site to its core colors, fonts, and CSS variables; export as Tailwind config or CSS variables.
- **Structured data scraping** — Extract product cards, search results, or Reddit threads as JSON via CSS selectors. Built-in templates for Hacker News, GitHub Trending, Product Hunt, Reddit, generic blogs.
- **Sitemap discovery for migrations** — find every URL on a legacy site before a redesign so no page is lost to 404s.

## Pricing

OpenCrawl is currently free during development. No signup required for sitemap discovery; the other tools accept a free API token retrieved via Google sign-in.

## FAQ

### What is GEO (Generative Engine Optimization)?

GEO is the practice of structuring web content so AI search engines — ChatGPT, Claude, Perplexity, Google AI Overviews, Bing Copilot — can extract, summarize, and cite it accurately. It overlaps SEO but emphasizes machine-readable signals: JSON-LD schema (FAQPage, HowTo, Article), llms.txt presence, definition-first paragraphs, numbered HowTo steps, and tabular data over prose.

### What is the difference between Crawl and SEO in OpenCrawl?

Crawl returns the **page's content** — clean Markdown body plus image URLs — for AI ingestion. SEO returns an **audit report** — title length, meta description, headings hierarchy, schema types, link graph — for diagnosis. They share a backend cache: running either populates the other instantly.

### What is the difference between Crawl and Extract?

Crawl gives you the **whole page as Markdown** (one large text blob). Extract gives you **structured JSON rows** by applying CSS selectors to repeating items — e.g., 30 product cards as `[{title, price, rating}, ...]`. Use Crawl for content; Extract for tabular data.

### What is llms.txt?

llms.txt is an emerging convention (https://llmstxt.org) — a Markdown file at `/llms.txt` that gives LLM crawlers structured context about a site. OpenCrawl's GEO tool flags missing llms.txt as a fixable issue.

### Can I use OpenCrawl from Claude Code or Cursor?

Yes — fetch `https://opencrawl.opensora.store/skills/opencrawl/SKILL.md` (Claude Code skill format) for a single-file integration. Includes the API token flow, the six endpoint specs, and example requests for each tool.

## API

- POST `https://opencrawl.opensora.store/api/sitemap?domain=X` — discover URLs
- POST `https://opencrawl.opensora.store/api/fetch_one` body `{url, refresh?}` — Crawl single page
- POST `https://opencrawl.opensora.store/api/seo` body `{url, refresh?}` — SEO audit
- POST `https://opencrawl.opensora.store/api/clone` body `{url, refresh?}` — design system extract
- POST `https://opencrawl.opensora.store/api/extract` body `{url, item_selector, fields[]}` — structured fields
- All endpoints require `Authorization: Bearer <token>` (free, get via sign-in)

## Links

- Home: https://opencrawl.opensora.store
- Claude Code skill: https://opencrawl.opensora.store/skills/opencrawl/SKILL.md
- Sitemap: https://opencrawl.opensora.store/sitemap.xml
- robots.txt: https://opencrawl.opensora.store/robots.txt

_Last updated: 2026-05-16._