How AI Mode Actually Works: Query Fan-Out, Dense Retrieval, Pairwise Ranking

TL;DR

Google AI Mode is built on at least six published patents that, read together, describe a coherent system: fan-out, retrieval, ranking, reasoning, personalization, and stateful chat.
One user query becomes roughly nine synthetic subqueries before retrieval runs. Optimizing only for the literal query you targeted leaves most of the surface area uncovered.
Pairwise LLM ranking (patent US20250124067A1) makes "given these two passages, which is better?" the decisive comparison. Keyword density does not win this comparison. Clear, complete passages do.
User embeddings (patent WO2025102041A1) personalize the entire pipeline. Two users get different answers to identical queries. Logged-out rank tracking misses the personalization signal entirely.
Mike King's four content pillars (fit the reasoning target, fan-out compatible, citation-worthy, composition-friendly) are the operating model for content that survives this pipeline.

Google has not published a complete description of how AI Mode works. What it has published is a string of patents that, taken together, leave very little to the imagination. Mike King's "How AI Mode Actually Works" assembles them into a coherent technical story, and the story has direct implications for what kind of content gets cited.

The short version: a single user query is rewritten into many synthetic subqueries, each subquery retrieves passages by vector similarity, the passages get ranked pairwise by a language model, a reasoning chain assembles them, the user's embedding personalizes the whole thing, and the response gets composed with citations selected for the answer rather than the strongest source overall.

The detailed version is what this article is for.

The patent stack

King names six patents in the piece. Treating them as one architecture rather than six unrelated documents is the main intellectual move.

Fan-out

WO2024064249A1

Systems and methods for prompt-based query generation for diverse retrieval. The mechanism that turns one query into many.

Stateful chat

US20240289407A1

Search with stateful chat. Ambient memory, contextual persistence, memory-aware responses across a session.

Custom corpus

US20240362093A1

Query response from custom corpus. The plumbing for grounded retrieval against a defined source set, useful for site-scoped answers.

User embeddings

WO2025102041A1

User embedding models for personalization of sequence-processing models. Persistent dense vectors per user, injected at four pipeline stages.

Reasoning steps

US20240256965A1

Instruction fine-tuning machine-learned models using intermediate reasoning steps. Multi-stage thinking that scaffolds the final answer.

Pairwise ranking

US20250124067A1

Method for text ranking with pairwise ranking prompting. The "which of these two is better?" comparison that drives passage selection.

None of these patents on their own would change SEO practice. Stacked together, they describe a retrieval-and-synthesis pipeline that does not look like classical search.

6

Public Google patents that together describe AI Mode: fan-out, dense retrieval, pairwise ranking, reasoning chains, user embeddings, and stateful chat.
Mike King, iPullRank

Fan-out: one query becomes many

Patent WO2024064249A1 describes the fan-out mechanism. A user types one query. The system generates synthetic subqueries through chain-of-thought prompting that does three things: identifies the user's achievement goal, isolates expandable or ambiguous aspects of the query, and reframes the query across multiple semantic zones.

The subqueries are not random. King's piece names seven categories that the system actively filters for, optimizing for diversity across query type, content type, and semantic zone.

Synthetic query types

Type	Trigger	Example expansion
Related	Co-occurrence patterns in query logs	"top rated electric crossovers"
Implicit	Intent classifier infers unstated need	"EVs with longest range"
Comparative	Decision classifier expects a choice	"Rivian R1S vs. Tesla Model X"
Recent	Search history within the session	(context-dependent)
Personalized	Long-term behavioral signal	"EVs eligible for CA rebate"
Reformulation	LLM-based rewriting	"which electric SUV is the best"
Entity-expanded	Knowledge Graph adjacency	"Model Y reviews"

The single most important consequence is that ranking #1 for a core query stops being a guarantee of AI visibility. Per ZipTie's data referenced in King's piece, ranking #1 for the literal query produces roughly a 25% likelihood of appearing in AI Overviews. The remaining 75% of the visibility surface is decided by how well your content matches the synthetic subqueries the system generated in the background.

25%

Likelihood that a page ranked #1 for a core query actually appears in the corresponding AI Overview — fan-out decides the other 75%.
ZipTie, via iPullRank

Coverage is now multi-query, not single-query. A page that owns one phrase can lose to a page that addresses related, comparative, and entity-expanded variants of the same intent inside a single passage. Plan content for the cluster, not the head term.

Pairwise LLM ranking

Patent US20250124067A1 describes pairwise ranking. The system gives the language model two passages and asks: given this query, which is better? The output is a relative ordering, not an absolute score.

"Given this query, which of these two passages is better?"
— iPullRank summary of US20250124067A1

This shift matters because it changes which signals win. Absolute scoring rewards keyword overlap, link counts, and the heuristics that BM25 and PageRank were built around. Pairwise comparison rewards passage clarity, semantic completeness, and framing the model can endorse. King's framing: the unit being compared is the passage, and the scoring function is "model-mediated probabilistic relevance."

What this looks like in practice. Two passages on the same query: one densely keyword-stuffed but vague, one clean and answer-first with explicit numbers. The pairwise comparison consistently picks the second. Even without literal keyword match, the second passage offers something the first does not: a complete answer the model can extract.

User embeddings make rank tracking obsolete

Patent WO2025102041A1 covers user embeddings. The user is represented as a persistent dense vector generated from prior queries, click patterns, content interests, location, temporal data, and Google ecosystem behavior (Gmail, YouTube, Shopping, Maps).

The vector is injected at four points in the pipeline:

Query interpretation. The same query string is interpreted differently depending on who is typing.
Synthetic query generation. Fan-out adapts to user history. A user who frequently searches comparison queries gets more comparative variants.
Passage retrieval. Candidate passages are scored against query similarity and user-embedding similarity together.
Response synthesis. Final answers can be reframed for the user. Two users see different citations, sometimes different conclusions.

"Two users asking the same query may see different citations or receive different answers, not because of ambiguity in the query, but because of who they are."
— Mike King, iPullRank

The cross-surface implication is the one most teams underweight. The same user embedding informs personalization across Search, Gemini, YouTube, Shopping, and Gmail. Optimizing for one surface and ignoring the others gives the model less consistent signal to work with.

Logged-out rank tracking is functionally meaningless. The personalization signal is the dominant ranking factor for many AI Mode answers, and a logged-out crawler captures none of it. Rank tracking still answers an SEO question; it does not answer a GEO question.

Dense retrieval, multimodal, and the BM25 gap

AI Mode's primary retrieval layer is dense. Every query, subquery, document, and passage gets converted into a vector embedding, and ranking happens by cosine similarity. There is no static scoring function. There is no global rank for a document. There is similarity to the query at retrieval time.

"Most SEO software still operates on sparse retrieval models rather than dense retrieval models. We don't have tools that parse and embed content passages."
— Mike King

The retrieval is also multimodal. The MUM model lets the system synthesize across text, video, audio, transcripts, imagery, diagrams, and dynamic visualizations, including across languages. A YouTube transcript, an image caption, or a diagram parsed into structured facts can all be retrieved and cited. Content strategy that ignores non-text modalities is leaving signal sources unbuilt.

Stateful chat: the conversation is the query

Patent US20240289407A1 covers stateful chat. The system maintains ambient memory of the conversation as aggregated embeddings, provides contextual persistence so session state informs subsequent queries, and shapes responses by historical user behavior.

The practical change is that the user's third query in a session is not interpreted in isolation. It is interpreted given the first two queries and the model's responses. A passage that answers the literal third query but contradicts the established session context is less likely to be cited than a passage that fits the conversation's drift.

The four strategic content pillars

King distills the operating model into four pillars. Each addresses a different stage of the pipeline.

Fit the reasoning target

Passages must be semantically complete in isolation, articulate comparisons and tradeoffs explicitly, and survive pairwise ranking against well-argued alternatives. A self-contained 200-word passage that includes the comparison, the tradeoff, and the conclusion will beat a 1,200-word page that leaves the reader to assemble the answer.

Be fan-out compatible

Content has to be entity-rich, Knowledge Graph-aligned, and address evaluation, comparison, and constraint intents (not just informational ones). If a piece does not surface the entities and the comparisons fan-out is generating subqueries around, it will not match those subqueries.

Be citation-worthy

Factual, attributable, verifiable. Quantitative data with named sources. Specific numbers beat round numbers. Named studies beat vague references to "research." The model selects citations that are easy to defend, not citations that are stylistically pleasant.

Be composition-friendly

Scannable, modular, with answer-first phrasing, FAQs, TL;DRs, and semantic markup. The model is composing an answer; it picks passages that compose easily into one. Buried answers do not get extracted.

The 10 software gaps King names

King lists ten capabilities standard SEO platforms do not yet support. Each one is a gap between what the AI Mode pipeline does and what tools can measure.

AI-specific Google Search Console reporting (impressions/clicks per AI surface)
Logged-in rank tracking with user personas (no current tool simulates personalization)
Vector embedding explorer (visualize how passages cluster in retrieval space)
Matrixed semantic content editors (optimize a passage for many subqueries at once)
Query journey mapping (chains of related queries within a session)
Personalized retrieval simulations (what does this audience persona retrieve?)
Query classification (intent type per subquery)
Query expansion simulation (Qforia partially fills this)
Clickstream data integration (tying retrieval to downstream behavior)
Reasoning chain simulation (which intermediate steps would the model take?)

The list is also a roadmap. Vendors building any of the ten cleanly are building for the next half-decade.

Qforia and the simulation gap

Qforia is King's own answer to gap #8. It uses Gemini 2.5 Pro to simulate fan-out for a target query. Inputs: target query, Gemini API key, surface (AIO vs AI Mode). Outputs: synthetic query list, intent classification, reasoning chains, exportable data.

The practical use is not academic. Run a query through Qforia. Read the synthetic queries it generates. Find the ones your content does not address. Build passages targeting those gaps. Repeat for the queries your business depends on.

What this means for content strategy

The pipeline rewards different things than classical SEO did. The shifts compound:

Plan for clusters, not phrases. Fan-out turns a single query into many. Optimize for the cluster of subqueries, not the head term.
Write passages, not pages. Pairwise ranking compares chunks. Self-contained passages with explicit answers win comparisons.
Make claims attributable. Citations get selected for defensibility. Numbers, dates, and named sources are the currency.
Cover modalities. Multimodal retrieval pulls from video, audio, transcripts, and images. Text-only content covers a fraction of the signal.
Stop relying on logged-out rank. Personalization is the dominant ranking factor for many queries. Track citations and recommendations across engines, not positions on logged-out SERPs.

"The user doesn't care where content comes from as long as they get viable answers."
— James Cadwallader, CEO of Profound, quoted in King's piece

How BeCited measures against this pipeline

Our audit pipeline is built around the gaps King names. Specifically:

Multi-engine capture. Every audit captures across ChatGPT, Gemini, Perplexity, and Claude in parallel. Different engines run different retrieval pipelines, and the same passage performs differently across them.
Query fan-out approximation. Each profile generates 25-50 prompts across intent tiers (high/medium/low) so coverage spans the synthetic-query categories King names, not just head terms.
Position-weighted recommendation strength. Where you appear in a list is weighted (1.25x for first, 0.85x for third or later). Pairwise comparisons happen pairwise; our scoring approximates that.
Source tier classification. Each cited source is classified primary/secondary/tertiary by domain match against the profile's source map. The ratio is more diagnostic than total mention count.
Confidence intervals on every dimension. 95% binomial CIs on every score. Small visibility differences are flagged as not statistically distinguishable, so teams do not over-react to noise.

Frequently asked questions

What is query fan-out?

Query fan-out is the process by which AI Mode rewrites a single user query into multiple synthetic subqueries before retrieval. The mechanism is described in Google patent WO2024064249A1. A query like "best electric SUV" fans out into related queries, implicit queries, comparative queries, recent queries, personalized queries, reformulation queries, and entity-expanded queries. The implication: ranking #1 for a single core query gives roughly a 25% likelihood of appearing in AI Overviews per ZipTie data.

What is pairwise ranking in AI search?

Pairwise ranking, described in Google patent US20250124067A1, is a method where the language model is given two passages and asked: "Given this query, which of these is better?" The output is a relative ordering rather than an absolute score. This shifts the optimization target from keyword density to passage clarity, semantic completeness, and model-preferred framing.

What are user embeddings and why do they matter for SEO?

User embeddings are persistent dense vector representations of an individual user, generated from prior queries, click patterns, content interests, ecosystem behavior, location, and temporal data. The relevant patent is WO2025102041A1. They get injected at query interpretation, synthetic query generation, passage retrieval, and response synthesis. The implication: two users asking the same query may receive different citations and different answers based on who they are.

What are the four strategic content pillars for AI Mode?

Mike King describes four pillars for passage-level optimization. Fit the reasoning target: passages must be semantically complete in isolation and survive pairwise ranking. Be fan-out compatible: entity-rich, Knowledge Graph-aligned, addressing evaluation, comparison, and constraint intents. Be citation-worthy: factual, attributable, verifiable, with quantitative data and named sources. Be composition-friendly: scannable, modular, with answer-first phrasing, FAQs, TL;DRs, and semantic markup.

What is dense retrieval?

Dense retrieval converts queries, subqueries, documents, and passages into numeric vector embeddings, then ranks them by cosine similarity. This contrasts with sparse retrieval (TF-IDF, BM25), which scores keyword overlap. AI Mode uses dense retrieval as its primary scoring layer. The practical consequence: synonyms, paraphrases, and semantically equivalent phrases are recognized without exact keyword match.

What is Qforia?

Qforia is a free tool built by Mike King at iPullRank that simulates query fan-out using Gemini 2.5 Pro. Inputs: a target query, a Gemini API key, and the surface to simulate (AI Overviews or AI Mode). Outputs: a generated list of synthetic queries with intent classification, reasoning chains, and exportable data. Practical use: identify the synthetic queries your content is missing coverage on, then build passages targeting those gaps.

Multi-engine, multi-intent capture

See how your brand fares across the real retrieval pipeline.

BeCited audits run 25-50 prompts per audit across ChatGPT, Gemini, Perplexity, and Claude, weighted by intent tier and position in the answer. Closer to the pipeline than logged-out rank tracking can get.

Run Free Site Scan See $2k Full Audit

Next in this series

Article 08 — Technical foundations for AI search: site IA, entity mapping, crawlability, rendering, performance