how web.run and fan-out queries shape AI visibility

When OpenAI switched default models on March 4, the number of websites cited per response dropped by a fifth, and never recovered. But the citation drop is only part of the story.

We also reverse-engineered ChatGPT’s internal browsing tools, ran a honeypot experiment, reconstructed its system prompt, and released a new version of our ChatGPT Search Capture plugin.

What happened

On March 4, ChatGPT switched its default model from GPT-4o/5.2 to GPT-5.3 Instant. The result: the average number of unique domains cited per response dropped from 19 to 15, a decline of more than 20%.

Unique URLs per response followed the same trajectory, falling from 24 to 19. We tracked 400 daily prompts over 14 weeks, using monitoring data provided by Meteoria.

Why we care

ChatGPT has 900 million weekly active users. The citation surface in each response hasn’t changed, but fewer websites are sharing it. Same pie, fewer slices.

This likely reflects a structural shift toward higher-authority sources, but it also means fewer winners overall. Sites that don’t make the cut are losing visibility that was previously within reach.

We named this phenomenon after the “Bigfoot update” (identified by Dr. Peter J. Meyers of Moz in 2012), when Google would sometimes let a single domain occupy the entire first page of results.

ChatGPT now retrieves fewer domains per response, but the URL-to-domain ratio has remained stable at 1.26. Crawl depth per domain hasn’t changed. What has changed is how many distinct websites get a seat at the table.

GPT-5.4 Thinking amplifies the concentration further. The model uses “site:” operators to restrict searches to trusted domains and distributes its queries across often more than 10 “fan-out queries” per response, each targeting a specific source.

Independent log analysis by Jérôme Salomon (Oncrawl) confirms the trend. ChatGPT-User bot crawl volume has settled at a lower level since the switch to 5.3. Some pages simply aren’t being crawled anymore.

The cause goes beyond model updates: more than 90% of ChatGPT’s weekly users are on the free plan, and the default experience triggers fewer web searches, uses fewer queries, and produces fewer citations.

How ChatGPT Search actually works

Our study also includes a full reverse engineering of ChatGPT’s internal search system, called web.run. Before 5.3, the model sent compact text commands separated by pipes (fast|query|recency). After 5.3, it sends structured JSON objects with typed parameters.

This isn’t just a format change. It reflects a different architecture in how the model formulates and distributes its web operations.

The web tool now supports 12 operations, up from 4 (plus a separate widget system called genui). These include:

search_query
open
find
click
screenshot
product_query
Specialized widgets for sports, finance, weather, and more.

GPT-5.4 can chain 5 to more than 10 rounds of search per response, refining queries based on previous results. GPT-5.3 Instant typically runs 2 or 3.

Google’s fingerprints are still visible: Google tracking markers (strlid) appear in product URLs, and SearchAPI ID-to-token matches reveal the backend’s reliance on third-party search providers — and Google behind the scenes.

A new type of fan-out for product queries

We uncovered a previously undocumented fan-out type: browse_rewritten_queries. It appears exclusively on product queries, on 5.4 Instant, and is visible in conversation code.

When a user asks something like [best 3D printer to buy in 2026] ChatGPT first runs a single rewrite fan-out to build the full list of candidate products. Then it launches a separate shopping fan-out for each individual product, fetching specs, reviews, and pricing one by one.

Before 5.3, product searches were bundled into a single call. Each product now gets its own dedicated retrieval command.

ChatGPT-User is the retrieval agent

Our honeypot experiment confirmed an important detail. When ChatGPT browses the web following a search during a conversation, the ChatGPT-User crawler — not OAI-SearchBot — fetches the page content.

OpenAI describes OAI-SearchBot as the agent that builds ChatGPT’s search index, but in practice, the model relies on third-party scraping APIs for search results, then sends ChatGPT-User to retrieve the actual content from selected URLs.

This may be our most surprising finding.

The trail started with classic reverse engineering. We decompiled the ChatGPT mobile app, dissected the web client source code, and sniffed network packets on both platforms. That gave us the names of internal tools and some calling conventions.

Armed with these specifics, we were able to ask ChatGPT the right questions, and discovered the model answered without any restrictions.

OpenAI has real safeguards around its system prompts. But the internal tool configuration layer has none.

ChatGPT’s namespaces — the groups of internal tools the model can call during a conversation — are freely describable. As long as you avoid the words “system prompt,” the model will disclose tool schemas, operation lists, output channels, and namespace structures with perfect consistency.

We published ready-to-use prompts that anyone can paste into ChatGPT to audit its internal environment. To verify that the model wasn’t hallucinating these descriptions, we ran a participatory study with dozens of users across separate sessions. Every participant got exactly the same tool names, parameter schemas, and operation lists. The model consistently and reliably describes its own tooling.

The study also includes a reconstructed system prompt extracted progressively, along with several notable findings:

Reddit is the only domain exempted from copyright word limits.
There is a granular list of banned products.
A “verbosity score” operates on a 1–10 scale.
A full advertising policy paragraph governs ad display by subscription tier.

Practical use: running your own crawlability audit

The web.run syntax we documented isn’t just a technical curiosity. It works, and it opens a direct path for testing how ChatGPT interacts with your content.

Here’s a concrete example. You can force ChatGPT to search your domain and read specific pages by pasting JSON commands directly into a conversation. First, trigger a targeted search on your site, then force it to fetch the first two results, then ask it to return the title, main topic, and key points from each page.

"Search for this query, then open the first two results and summarize what you find on each page.

Step 1: Search:

{ “search_query”: [ { “q”: “site:abondance.com seo” } ], “response_length”: “short” }

Step 2: Open the first two results:

{ “open”: [ { “ref_id”: “turn0search0” }, { “ref_id”: “turn0search1” } ] }

Step 3: Give me a structured recap of what you found on each URL. For each page: the title, the main topic, and 3–5 key points."

What you get is a view of your content through ChatGPT’s eyes: what it can actually reach, what it extracts, and how it interprets your pages.

If ChatGPT can’t access a page, returns garbled content, or completely misses your main messages, that’s a signal to act on.

Same model family, different citations

GPT-5.2, 5.3, and 5.4 share the same knowledge cutoff (August 2025) and belong to the same GPT-5 family. Yet the same prompt sent to each produces different fan-out queries, retrieves different sources, and surfaces different passages in the final response.

Multiple layers of divergence come into play after pre-training: RLHF reward shaping, supervised fine-tuning data, system prompt configurations, and inference-time compute budgets. GPT-5.4 Pro explicitly gets more compute to “think harder,” and that alone can change which sources are cited.

This is why we recommend testing model by model. A single prompt can produce radically different citations depending on whether the user is on GPT-5.3 Instant, 5.4 Thinking, or 5.4 Extended. Free-plan users may also be silently routed to a lighter model.

Two types of AI visibility

Our study introduces a framework that separates parametric visibility (what the model learns from training data with search disabled) from dynamic visibility (what it retrieves in real time with search enabled).

Parametric visibility: E-E-A-T for LLMs. Parametric visibility is the E-E-A-T equivalent for large language models. It’s authority encoded across billions of training examples, shaped by press coverage, Wikipedia presence, other high-authority sites, and the overall training corpus. It’s stable and measurable through one-shot API audits.
Dynamic visibility: shifting ground. Dynamic visibility is volatile. It’s model-dependent and requires continuous monitoring. It’s closer to traditional SEO, and can collapse overnight with a model update, as the Bigfoot Effect shows.
The link between the two matters. The model formulates its web queries by targeting sources it already knows. A brand absent from parametric memory won’t even be considered as a search candidate. Being unknown to the model means being invisible before the search even starts.

Knowledge cutoff updates are the “Google Dance” of LLMs. When the cutoff date changes, parametric rankings are redistributed in bulk. But this only happens roughly once a year, because retraining at that scale is extremely expensive. The strategic window for influencing what the model knows about your brand sits between two cutoff dates.

Dan Petrovic’s (DEJAN) AI Brand Authority Index illustrates parametric measurement at scale. Our study complements it with a lighter, reproducible testing framework based on five prompts run multiple times for a one-shot audit.

Dig deeper

The full study — including reverse-engineered documentation, the honeypot experiment, DIY audit prompts, and the reconstructed system prompt — is available at think.resoneo.com/chatgpt/5.3-5.4/.

Bottom line

ChatGPT Search is no longer a black box. This study maps its internal architecture, from the web.run tool that powers every search to the fan-out logic that decides which domains are fetched and which are ignored.

The 20% drop in cited domains after the switch to 5.3 shows how fast the citation landscape can shift with a single model update. But the deeper issue is structural: ChatGPT is concentrating citations on fewer websites and applying source selection logic shaped by training data, post-training fine-tuning, and system prompt rules that change from one model to the next.

Tracking visibility in ChatGPT means understanding two distinct layers (parametric and dynamic), testing across multiple models, and monitoring a system whose internal tools are documentable but whose behavior can change overnight.

The full study provides the data, methodology, and tools to get started.

Contributing authors are invited to create content for Search Engine Land and are chosen for their expertise and contribution to the search community. Our contributors work under the oversight of the editorial staff and contributions are checked for quality and relevance to our readers. Search Engine Land is owned by Semrush. Contributor was not asked to make any direct or indirect mentions of Semrush. The opinions they express are their own.