TL;DR

Site readiness is the technical layer of GEO. It's what determines whether AI crawlers can reach your content, parse it cleanly, quote it accurately, and understand who you are as an entity.

BeCited runs 14 checks before any prompts are captured. The result is a letter grade with check-by-check pass/warn/fail signals, weighted contributions, and prioritized recommendations. The 14 checks are grouped into five tiers, mapped to the three pillars of GEO (retrievability, citability, recognizability).

The full list with weights

All weights default to a 100-point scale, but each business profile can override individual weights. Setting a weight to 0 excludes the check from the grade denominator entirely — the check still runs and reports, but doesn't affect the grade. Local services, for instance, zero out agentic readiness; SaaS bumps it from 5 to 10 because MCP and agent-discovery signals are critical for SaaS visibility.

CheckTierWeight
robots.txtCrawlability8
llms.txt / llms-full.txtCrawlability4
sitemap.xmlCrawlability5
JSON-LD schema.orgStructured metadata12
OpenGraph & meta tagsStructured metadata8
Heading structureStructured metadata5
Quotable claimsContent extraction8
Semantic HTMLContent extraction7
FAQ content formatContent extraction3
E-E-A-T signalsContent extraction5
Content freshnessContent quality8
Quotability scoreContent quality8
Entity readinessEntity & agent8
Agentic readinessEntity & agent5
Total94

The default weights sum to 94, with profile-specific overrides bringing each effective grade denominator to its own total. The grade itself is a percentage of points earned over points available, mapped to a letter (A–F).

Tier 1: Crawlability & discovery

If AI crawlers can't reach your content, nothing else matters. This tier is the foundation of retrievability.

1. robots.txt — AI crawler classification

8 pts

BeCited classifies AI bots as training (GPTBot, Google-Extended, ClaudeBot, CCBot) versus retrieval (OAI-SearchBot, ChatGPT-User, PerplexityBot, Claude-SearchBot). Blocking training bots is a defensible policy choice. Blocking retrieval bots makes you invisible to AI search.

The most common anti-pattern: brands block GPTBot to "protect" content but accidentally block OAI-SearchBot too. Per a BuzzStream analysis, 71% of sites have this misconfiguration.

2. llms.txt & llms-full.txt

4 pts

An emerging standard for telling LLMs which content is most quotable, structured for fast ingestion. Adoption is still low and the spec is informal, but the upside is real: a clean llms.txt gives engines a curated index of your most authoritative pages.

3. sitemap.xml

5 pts

The classic sitemap is still load-bearing for AI crawlers. We check it exists, that it's reachable from robots.txt, and that lastmod values are populated — freshness signals feed directly into the freshness check below.

Tier 2: Structured metadata

Once a crawler has your page, structured metadata tells it what the page is about. This is the highest-weighted single tier in the audit.

4. JSON-LD schema.org

12 pts

The biggest single technical lever in the audit. Independent studies put the citation uplift from proper structured data at 1.8–3.2x. We check for type-appropriate schema (LocalBusiness or Service for local; SoftwareApplication for SaaS; Product for consumer goods) plus Organization, FAQPage, AggregateRating, and Review where relevant.

5. OpenGraph & meta tags

8 pts

Title, description, og:title, og:description, og:image. AI engines often quote meta description verbatim when the page is cited. A weak description is a wasted billboard.

6. Heading structure

5 pts

One H1 per page, descriptive H2s, and a consistent hierarchy. Headings phrased as questions tend to earn 40% more citations because they map directly to user prompts.

Tier 3: Content extraction signals

This tier is research-backed against the Princeton GEO paper and LSEO content extraction studies. It governs whether your content is actually quotable, not just present.

7. Quotable claims

8 pts

Self-contained 50–150-word chunks with answer-first structure. AI engines pull blocks, not paragraphs. A page full of conversational prose with no extractable claims will lose to a competitor with one well-formed answer block.

8. Semantic HTML

7 pts

main, article, section, lists, tables — not div soup. Engines parse semantic elements faster and trust them more. We count semantic tags vs unstructured div containers and flag the imbalance.

9. FAQ content format

3 pts

FAQPage schema, native HTML <details> elements, or Q&A-style headings. FAQs are over-represented in AI citations because their structure aligns with the prompt-and-answer pattern of generative search.

10. E-E-A-T signals

5 pts

Person/author schema, byline markup, credential language ("certified", "licensed", "N years of experience"). Per AI Overview research, 96% of cited pages have strong E-E-A-T signals. A position-6 page with E-E-A-T markup beats a position-1 page without.

Tier 4: Content quality

Two checks that govern the quality of the content itself, beyond formatting.

11. Content freshness audit

8 pts

Last-Modified HTTP headers, JSON-LD dateModified, and sitemap lastmod. Perplexity weights content under 30 days old at roughly 3.2x. Half of all AI citations come from content less than 11 months old. A site that refreshes nothing gets less of the answer.

12. Quotability score

8 pts

A composite of paragraph length distribution, answer-first pattern detection, statistic density, and self-contained chunk count. Engines reward content that's easy to lift; this score quantifies how lift-friendly your pages are.

Tier 5: Entity & agent signals

This tier covers whether engines understand who you are at the entity level — and, for SaaS, whether agentic systems can interact with your product.

13. Entity readiness

8 pts

Wikipedia presence, Wikidata entry, Organization schema with sameAs links to authoritative profiles, and consistent brand naming across the web. AI engines are reluctant to recommend entities they can't disambiguate.

14. Agentic readiness (SaaS only)

5 pts

AGENTS.md, OpenAPI spec, public API documentation, and MCP manifest. As Anthropic's Model Context Protocol, Google's UCP, and Visa's Agentic Ready standards mature, this signals to AI engines that your product can be invoked, not just described. Local service businesses zero out this check; SaaS profiles bump it to 10.

How the grade is calculated

The grade is a weighted percentage, not a checklist score. Each check returns pass / warn / fail. Pass earns full weight, warn earns half, fail earns zero. Total points earned divided by sum of effective weights gives a 0–100 score, which maps to A (85+), B (70–84), C (55–69), D (40–54), or F (under 40).

Profile overrides change the denominator, not the grading curve. If a check is set to weight 0, it's excluded entirely from the grade calculation; the check still runs and surfaces in the report, but doesn't affect the letter.

The output of all 14 checks lives in site-readiness.json alongside the prompt-based audit data, so technical fixes and content fixes are graded together. A brand can have an A+ GEO Score on prompts but a D on site readiness if their robots.txt blocks retrieval bots; we'd still flag that as the top action item.

Where most brands fail first

Across audits we've run, three patterns recur:

  1. Robots.txt over-blocks. Either the site blocks all AI bots indiscriminately, or (more often) it blocks the wrong subset. A retrieval-bot block is a self-inflicted invisibility cloak.
  2. Schema is missing or wrong-typed. Many sites have JSON-LD for WebSite or BreadcrumbList but nothing for the entity that matters — LocalBusiness, SoftwareApplication, or Product. AI engines can't ground claims to a missing entity.
  3. Quotability is low. Long meandering paragraphs, no answer-first structure, no statistics. The competitor with shorter, denser blocks gets quoted; the long-form site gets ignored even when its content is better.

None of the 14 checks are theoretical. Each one corresponds to a measurable change in citation behavior we (and the broader research community) can document. Fixing the highest-weighted, lowest-effort checks first is usually the fastest way to move a GEO Score in 60 days.

About BeCited

We measure what AI says about your business

Every BeCited audit runs all 15 site readiness checks plus 100–300 buying-intent prompts across ChatGPT, Gemini, Perplexity, and Claude. Results are scored with a calibrated rubric (Cohen's κ = 0.722) and 95% confidence intervals, then translated into a prioritized action plan delivered in one week.