SEO Crawler Test Site
A comprehensive test site with intentional SEO issues across every category. Use this to validate your SEO crawler against known issues.
Cross-Reference Testing Guide
Pages grouped by testing theme rather than issue category. Your crawler should discover these cross-category relationships through link analysis.
Critical Missing Elements
Pages where essential HTML elements are completely absent.
- No title tag — crawlers must flag this immediately
- No H1 heading — primary heading is absent
- No meta description — SERP snippet affected
- No image alt text — accessibility + SEO failure
- No viewport meta — mobile rendering broken
- No canonical URL — duplicate risk
- No Open Graph tags — social sharing fails
- No language attribute — i18n + a11y issue
- No charset declaration — encoding ambiguity
- No image dimensions — CLS triggered
Duplicate & Conflict Signals
Pages sending contradictory or duplicate signals to search engines.
- Duplicate title (page A) + page B
- Duplicate description (A) + (B)
- Duplicate content (A) + (B)
- Multiple conflicting canonicals
- Noindex but in sitemap — contradictory directives
- Canonical pointing elsewhere
- Uppercase URL — case sensitivity duplicate risk
Crawlability Obstacles
Pages that challenge a crawler's ability to discover and process content.
- Redirect chain (3 hops) — follow through to the end
- Redirect loop — infinite cycle detection
- Meta refresh redirect — client-side redirect
- Page-level nofollow — link equity blocked
- Blocked by robots.txt — respect crawl rules
- Deeply nested pages — 5+ click depth
- JavaScript hrefs — non-crawlable links
- IFrame embedded content — cross-frame discovery
- CSS-hidden text — visible vs. rendered content
Content Quality Signals
Pages with various content quality issues that affect rankings.
- Thin content — insufficient word count
- Keyword stuffing — spam signal
- Lorem ipsum placeholder — non-meaningful content
- Skipped heading levels — H1 → H3 gap
- Overly long H1 — heading length issue
- Empty H1 element — heading exists but is blank
- Multiple H1 tags — heading hierarchy confusion
- Soft 404 — 200 status but "not found" content
Link Equity & Authority Flow
Test how link signals propagate through the site architecture.
- Authority hub page — links to 15+ pages across categories
- Nofollow / sponsored / UGC links — equity blockers
- Broken internal links — equity lost to 404s
- Broken external links — outbound link rot
- Excessive links (250+) — link dilution
- Blog post with contextual links — natural linking
- Canonical to 404 — wasted canonical signal
- HTTP canonical on HTTPS — protocol leak
Technical & Structured Data
Schema markup, security, and international targeting issues.
- Valid JSON-LD — correctly implemented schema
- Invalid JSON-LD syntax — parsing will fail
- Incomplete schema — required fields absent
- Wrong schema type — type mismatch
- Mixed content (HTTP on HTTPS)
- Invalid hreflang codes
- Missing hreflang return tags
- Excessive inline CSS — bloated HTML
Length & Size Issues
Elements that are too long, too short, or incorrectly sized.
- Title too long — gets truncated in SERPs
- Title too short — insufficient signal
- Description too long — SERP truncation
- Description too short — wasted opportunity
- H1 too long — heading overflow
- Excessively long alt text
- Very long URL
Pagination & Multi-Part
Paginated content and multi-page sequences.
- Pagination page 1 → page 2 → page 3
- Incomplete social tags on paginated content
- Duplicate content risk in paginated series
- Canonical missing on paginated pages
Browse by Category
Good / Control Pages
Pages with proper SEO as a baseline for comparison.
Meta Tag Issues
Title, description, canonical, OG, and viewport problems.
- Missing title
- Duplicate title (page 1)
- Duplicate title (page 2)
- Title too long
- Title too short
- Missing description
- Description too long
- Description too short
- Duplicate description (1)
- Duplicate description (2)
- Missing viewport
- Missing canonical
- Wrong canonical (404 target)
- Multiple canonicals
- Missing Open Graph
- Incomplete Open Graph
- Noindex in sitemap
- Page-level nofollow
- Meta refresh redirect
- Missing lang attribute
- Missing charset
Content Issues
Heading problems, thin content, duplicates, and spam signals.
Link Issues
Broken links, redirects, nofollow, and problematic href patterns.
Image Issues
Missing alt text, broken sources, and dimension problems.
Deep Page Structure
Pages nested 5+ levels deep to test crawl depth.
- Level 1 (2 clicks)
- Level 2-5 only reachable by following links deeper
Structured Data / Schema
Valid, invalid, and malformed JSON-LD structured data.
Hreflang Issues
International targeting with broken language tags.
Security / Protocol
Mixed content and HTTPS issues.
Miscellaneous Issues
Soft 404s, iframes, blocked pages, and more.
Orphaned Pages (Not Linked Here)
Three pages exist at /orphan/lost-page-1, /orphan/lost-page-2, and /orphan/deep-orphan but are intentionally NOT linked from anywhere. They are excluded from the sitemap and blocked by robots.txt. Your crawler should detect their absence from the link graph.