The 14 Site Readiness Signals Every Brand Needs

TL;DR

BeCited grades every audited site against 14 AI-readiness checks across five tiers, summing to a 100-point letter grade.
The single biggest technical mistake we see is allowing training bots (GPTBot, Google-Extended) while blocking retrieval bots (OAI-SearchBot, PerplexityBot, Claude-SearchBot). 71% of sites have this anti-pattern.
Schema.org JSON-LD (12 points) and entity readiness (8 points) are the highest-weighted single levers. Both are usually fixable in a week.
Profile-specific overrides matter: local services zero out agentic readiness; SaaS bumps it up. Apply weights to your context.

Site readiness is the technical layer of GEO. It's what determines whether AI crawlers can reach your content, parse it cleanly, quote it accurately, and understand who you are as an entity.

BeCited runs 14 checks before any prompts are captured. The result is a letter grade with check-by-check pass/warn/fail signals, weighted contributions, and prioritized recommendations. The 14 checks are grouped into five tiers, mapped to the three pillars of GEO (retrievability, citability, recognizability).

The full list with weights

All weights default to a 100-point scale, but each business profile can override individual weights. Setting a weight to 0 excludes the check from the grade denominator entirely — the check still runs and reports, but doesn't affect the grade. Local services, for instance, zero out agentic readiness; SaaS bumps it from 5 to 10 because MCP and agent-discovery signals are critical for SaaS visibility.

Check	Tier	Weight
robots.txt	Crawlability	8
llms.txt / llms-full.txt	Crawlability	4
sitemap.xml	Crawlability	5
JSON-LD schema.org	Structured metadata	12
OpenGraph & meta tags	Structured metadata	8
Heading structure	Structured metadata	5
Quotable claims	Content extraction	8
Semantic HTML	Content extraction	7
FAQ content format	Content extraction	3
E-E-A-T signals	Content extraction	5
Content freshness	Content quality	8
Quotability score	Content quality	8
Entity readiness	Entity & agent	8
Agentic readiness	Entity & agent	5
Total		94

The default weights sum to 94, with profile-specific overrides bringing each effective grade denominator to its own total. The grade itself is a percentage of points earned over points available, mapped to a letter (A–F).

Tier 1: Crawlability & discovery

If AI crawlers can't reach your content, nothing else matters. This tier is the foundation of retrievability.

1. robots.txt — AI crawler classification

8 pts

BeCited classifies AI bots as training (GPTBot, Google-Extended, ClaudeBot, CCBot) versus retrieval (OAI-SearchBot, ChatGPT-User, PerplexityBot, Claude-SearchBot). Blocking training bots is a defensible policy choice. Blocking retrieval bots makes you invisible to AI search.

The most common anti-pattern: brands block GPTBot to "protect" content but accidentally block OAI-SearchBot too. Per a BuzzStream analysis, 71% of sites have this misconfiguration.

2. llms.txt & llms-full.txt

4 pts

An emerging standard for telling LLMs which content is most quotable, structured for fast ingestion. Adoption is still low and the spec is informal, but the upside is real: a clean llms.txt gives engines a curated index of your most authoritative pages.

3. sitemap.xml

5 pts

The classic sitemap is still load-bearing for AI crawlers. We check it exists, that it's reachable from robots.txt, and that lastmod values are populated — freshness signals feed directly into the freshness check below.

Tier 2: Structured metadata

Once a crawler has your page, structured metadata tells it what the page is about. This is the highest-weighted single tier in the audit.

4. JSON-LD schema.org

12 pts

The biggest single technical lever in the audit. Independent studies put the citation uplift from proper structured data at 1.8–3.2x. We check for type-appropriate schema (LocalBusiness or Service for local; SoftwareApplication for SaaS; Product for consumer goods) plus Organization, FAQPage, AggregateRating, and Review where relevant.

5. OpenGraph & meta tags

8 pts

Title, description, og:title, og:description, og:image. AI engines often quote meta description verbatim when the page is cited. A weak description is a wasted billboard.

6. Heading structure

5 pts

One H1 per page, descriptive H2s, and a consistent hierarchy. Headings phrased as questions tend to earn 40% more citations because they map directly to user prompts.

Tier 3: Content extraction signals

This tier is research-backed against the Princeton GEO paper and LSEO content extraction studies. It governs whether your content is actually quotable, not just present.

7. Quotable claims

8 pts

Self-contained 50–150-word chunks with answer-first structure. AI engines pull blocks, not paragraphs. A page full of conversational prose with no extractable claims will lose to a competitor with one well-formed answer block.

8. Semantic HTML

7 pts

main, article, section, lists, tables — not div soup. Engines parse semantic elements faster and trust them more. We count semantic tags vs unstructured div containers and flag the imbalance.

9. FAQ content format

3 pts

FAQPage schema, native HTML <details> elements, or Q&A-style headings. FAQs are over-represented in AI citations because their structure aligns with the prompt-and-answer pattern of generative search.

10. E-E-A-T signals

5 pts

Person/author schema, byline markup, credential language ("certified", "licensed", "N years of experience"). Per AI Overview research, 96% of cited pages have strong E-E-A-T signals. A position-6 page with E-E-A-T markup beats a position-1 page without.

Tier 4: Content quality

Two checks that govern the quality of the content itself, beyond formatting.

11. Content freshness audit

8 pts

Last-Modified HTTP headers, JSON-LD dateModified, and sitemap lastmod. Perplexity weights content under 30 days old at roughly 3.2x. Half of all AI citations come from content less than 11 months old. A site that refreshes nothing gets less of the answer.

12. Quotability score

8 pts

A composite of paragraph length distribution, answer-first pattern detection, statistic density, and self-contained chunk count. Engines reward content that's easy to lift; this score quantifies how lift-friendly your pages are.

Tier 5: Entity & agent signals

This tier covers whether engines understand who you are at the entity level — and, for SaaS, whether agentic systems can interact with your product.

13. Entity readiness

8 pts

Wikipedia presence, Wikidata entry, Organization schema with sameAs links to authoritative profiles, and consistent brand naming across the web. AI engines are reluctant to recommend entities they can't disambiguate.

14. Agentic readiness (SaaS only)

5 pts

AGENTS.md, OpenAPI spec, public API documentation, and MCP manifest. As Anthropic's Model Context Protocol, Google's UCP, and Visa's Agentic Ready standards mature, this signals to AI engines that your product can be invoked, not just described. Local service businesses zero out this check; SaaS profiles bump it to 10.

How the grade is calculated

The grade is a weighted percentage, not a checklist score. Each check returns pass / warn / fail. Pass earns full weight, warn earns half, fail earns zero. Total points earned divided by sum of effective weights gives a 0–100 score, which maps to A (85+), B (70–84), C (55–69), D (40–54), or F (under 40).

Profile overrides change the denominator, not the grading curve. If a check is set to weight 0, it's excluded entirely from the grade calculation; the check still runs and surfaces in the report, but doesn't affect the letter.

The output of all 14 checks lives in site-readiness.json alongside the prompt-based audit data, so technical fixes and content fixes are graded together. A brand can have an A+ GEO Score on prompts but a D on site readiness if their robots.txt blocks retrieval bots; we'd still flag that as the top action item.

Where most brands fail first

Across audits we've run, three patterns recur:

Robots.txt over-blocks. Either the site blocks all AI bots indiscriminately, or (more often) it blocks the wrong subset. A retrieval-bot block is a self-inflicted invisibility cloak.
Schema is missing or wrong-typed. Many sites have JSON-LD for WebSite or BreadcrumbList but nothing for the entity that matters — LocalBusiness, SoftwareApplication, or Product. AI engines can't ground claims to a missing entity.
Quotability is low. Long meandering paragraphs, no answer-first structure, no statistics. The competitor with shorter, denser blocks gets quoted; the long-form site gets ignored even when its content is better.

None of the 14 checks are theoretical. Each one corresponds to a measurable change in citation behavior we (and the broader research community) can document. Fixing the highest-weighted, lowest-effort checks first is usually the fastest way to move a GEO Score in 60 days.

About BeCited

We measure what AI says about your business

Every BeCited audit runs all 15 site readiness checks plus 100–300 buying-intent prompts across ChatGPT, Gemini, Perplexity, and Claude. Results are scored with a calibrated rubric (Cohen's κ = 0.722) and 95% confidence intervals, then translated into a prioritized action plan delivered in one week.

Visit BeCited See a sample report