How to measure AI visibility now that precision is gone

The funnel query pathway (FQP), the cohort-with-intent tree you populate from the conversion node upward, is the measurement framework for AI visibility. Measuring the FQP every quarter produces a defensible strategic read you can actually act on.

The shift the methodology operationalizes is what I call the micro-macro shift. You can’t measure AI-era visibility with the precision micro (ranking) instruments search trained us to expect because assistive engines and agents are too opaque for micro-level measurement. Macro is the only available discipline.

Why the precision we used to take for granted no longer applies

The same economics-versus-economics distinction I drew earlier applies here: corner shop versus Bank of England, micro instruments versus macro instinct, with neither set of tools working in the other’s environment.

AI-era visibility lives in the same kind of macro environment that forced economics to develop a different measurement discipline, and it forces our industry to do the same.

Our industry operated at a micro scale with ranking and tracking, but the instruments we use for search don’t apply in AI. Microeconomics versus macroeconomics is the canonical case.

The structural property is brand-user-algorithm (BUA) opacity. The consequence matters here: four layers of opacity operate on every AI-era brand recommendation, and the brand has no visible signal at any of them.

The brand is opaque to the engine inside the walled garden. The user is opaque to themselves about how the engine reasoned on their behalf.

The engine is opaque to itself because the interpretability problem in large language models remains unsolved.

The brand is opaque to its own claim-level abstention events when the engine encounters contradictions in the corroboration backbone and silently declines to surface a specific claim.

The conversion rate softens, and the brand can’t see which contradiction caused the softening.

BUA opacity is why micro-instruments fail on assistive and agential surfaces. You can’t change that opacity.

It’s the environment you’re working in, and my methodology projects through it at the macro level, delivering trend rather than precision and accepting that the right answer is the one that holds up over time rather than the one that’s exact in the moment.

Your customers search everywhere. Make sure your brand shows up.

The SEO toolkit you know, plus the AI visibility data you need.

Start Free Trial

Get started with

Semrush One Logo

Where micro measurement still works — and where macro takes over

Micro and macro coexist. Three modes operate in parallel in 2026.

Search (essentially micro) hasn’t gone anywhere. It’s growing.
Assistive (essentially macro) has emerged alongside it.
Agent has emerged alongside both (the delightful mix of micro and macro).

Each mode has its own measurement environment, and the approach that makes sense for your business depends on the data that environment can supply.

*From my Google Marketing Live 2026 SEA Lab keynote: the three modes — search, assistive, agent — coexisting in 2026, each fulfilling a different need.*

Search keeps the user in control

The user types a query, the engine returns 10 options, and the user picks one. The brand can see the query, observe the position, measure the click, track the session, and attribute the conversion.

Micro instruments work because the environment supports them, and brands operating with search-era buyers on search-era surfaces should keep running micro strategies for those buyers. So the way you measure search doesn’t change, unless you want to add a macro methodology, which I personally think is a good idea.

Assistive narrows the choice at the user’s request

The user asks ChatGPT, Perplexity, Claude, Gemini, or Copilot for a recommendation, and the engine retrieves, synthesizes, and commits to one or two options on the user’s behalf.

The brand doesn’t see the series of exchanges, the retrieval, the synthesis, or, most importantly, the alternatives the engine considered before committing. You can see the conversion, but you can’t attribute it explicitly.

The entire journey runs inside walled gardens where you can’t measure with micro instruments, which means macro is the only available discipline. Assistive is the most elusive of the three.

Agent removes the decision from the user entirely

The user delegates, the agent executes, and the brand receives the order. The negotiation and transaction are observable, attributable, and measurable: the agent queried, negotiated, and (hopefully) bought your product, and you can micro-measure that.

What you can’t see is why the agent chose your product over its competitors because the decision logic the agent applied happened inside the agent, drawing on retrievals, comparisons, and reasoning the brand has no visibility into.

The pathway to the conversion is macro, but the conversion itself is micro.

The buyer chooses the surface

You might think search, assistive, and agential are a simple split where you can apply a dedicated measurement system to each. Not so.

Buyers move between search, assistive, and agent surfaces depending on what they’re buying, why, and how complex the decision is — often within the same journey. The brand doesn’t choose which surface its buyer will use. The buyer does, case by case, and the measurement (and strategic) methodology has to handle every surface mix the buyer chooses.

That’s why macro is the only viable solution.

How you measure defines your methodology

The clearest way to show and tell is to translate each search-era measurement into its AI-era equivalent. Here’s my take, though every practitioner running this work seriously will have their own opinion on every row, which is the point: how you define each row becomes the foundation of your methodology.

The funnel query pathway defines which queries I’m going to track, and the table below applies the same logic to every other measurement decision. The differences between practitioners on these rows will become increasingly visible in our measurement outputs over the coming months and years, and that visibility is the methodological signal worth paying attention to.

The macro methodology I’m publishing here is in its infancy. I started building it seriously this year, and the table below reflects my current position after a few months of thought, analysis, and live data collected since 2015.

I’m doing my best to finalize this list before the end of 2026 and freeze the methodology from January 2027 because of a constraint that matters: once you change a parameter, you lose direct comparability with everything you measured before the change. Quarter-eight compounding is only meaningful if the methodology remains stable across all eight quarters.

	Search	Assistive	Agential
Engine visibility	CTR-weighted share of the keyword cohort, normalized over time	The FQP queries in their conversational surface form, each in an active or aspirational state	Share of agent invocation events (catalog queries, mandate submissions, transactions) against the addressable agent surface
Buyer cohort definition	The FQP queries in their search-context surface form, each in active or aspirational state	The FQP queries in their conversational surface form, each in active or aspirational state	The FQP queries in their agent-readable form, each in active or aspirational state
Authority signal share	Share of corroboration authority across the category, normalized over time	Share of independent corroboration in the brand-trigger phrase context	Share of operational-evidence completeness against what the agent needs to verify before committing (pricing, terms, availability, fit)
How you change the output	Publish, structure, distribute against the cohort, and measure the shift quarter over quarter	Engineer the operational surface for agent legibility through MCP, structured data, and machine-actionable interfaces, and measure the shift quarter over quarter	Share of citations and mentions across the brand-trigger phrase cohort, weighted by prominence in the synthesized answer
Revenue and profit attribution	Share of revenue and margin from the search-mode cohort	Share of revenue and margin from the assistive-mode cohort, identified through referrer signals and user-agent strings	Share of revenue and margin from the agential-mode cohort, captured through agent-mandate logs and MCP telemetry

Take the measurement, express it as a share of the cohort, normalize it over time, and report the trend rather than the snapshot. That’s the move in every cell of the table, and it’s what makes the three columns directly comparable.

Keep running the micro instruments you already know from search-era practice: ranking position 1-10 on a specific keyword, CTR on a specific URL, and A/B test outcomes on a specific page element.

Use them for tactics, but keep them out of the strategic dashboard because they aren’t comparable to anything in the assistive or agential columns. If you mix them, you’ll lose the strategic value.

The five rows match across the three modes: read across any row and see your brand’s relative position across all three engines in directly comparable units.

Compare your search-mode share against your assistive-mode share and your agential-mode share on the visibility row, the authority row, and the revenue row, and you have a continuous read on which mode is producing the best return at this moment and how that weighting is shifting quarter over quarter thanks to your work. That gives you a macro-level view of your strategic priorities across all three engines.

The five rows also hold for paid measurement. Paid and organic are converging on the same engine and the same macro methodology.

How measurement works across the funnel query pathway

The funnel query pathway isn’t one tree. It’s an orchard. Each cohort-with-intent intersection you cultivate is a tree, and the orchard grows as you plant more trees. Each tree has three parts.

The trunk is the conversion node — a representative branded BOFU query that represents the buying moment for that cohort-with-intent intersection.
The branches are the MOFU evaluation queries that the buyer asks when researching options.
The twigs are the TOFU awareness queries that the buyer asked before narrowing to specific options or brands. The orchard grows from the ground of your brand and business operations, and the apples fall on that ground when the trees bear fruit.

The ground makes the orchard productive over time, and the brand that lets its ground go fallow watches the trees die, regardless of how well its branches are optimized.

You run measurement at every layer of every tree, but for different reasons, because the buyer’s intent shifts as you move up from trunk to branches to twigs, and the question you’re asking shifts with it.

Three funnel layers each have their own diagnostic:

Bottom of funnel (BOFU), where the buyer decides.
Middle of funnel (MOFU), where the buyer evaluates options.
Top of funnel (TOFU), where the buyer is still asking topical questions.

Bottom of funnel, brand-only: The trunk as a brand-confirming campaign

The trunk of every tree is the buying-moment query with your brand name in it. “Men’s red shirt from Uniqlo” is the trunk of the XL men buying a red shirt tree on the FQP I built for Uniqlo. Whatever the equivalent looks like for your brand sits in the equivalent position on every tree in your orchard.

One representative trunk query per tree is what Kalicube tracks period over period. The FAQ page on the brand’s site can carry as many variants of the BOFU query as the brand wants (and should), but the methodology tracks one trunk query per tree as the structural read on whether the tree is producing fruit. That single query is the representative sample for the whole trunk.

We measure three KPIs:

Brand appearance: When the engine answers the conversion query, does it surface your brand? You expect 100% appearance because the query carries your brand name, and the engine has no reason to omit you unless something has broken upstream. Any miss at this position is an audit-grade signal, and in commercial language, it’s the doubt tax or invisibility tax hitting at the bottom of your own funnel.
Sentiment of the appearance: When the engine surfaces your brand, the framing carries tone: positive, neutral, or negative, with a fourth hedged state in brackets. Hedged framing tells you the engine has surfaced you but isn’t confident enough to commit, which is the cascading confidence loss Rand Fishkin first documented, made visible at the recommendation surface.
Accuracy against the brand-defined AI résumé: The engine’s synthesis either matches your defined narrative or drifts from it. The drift is the framing gap made measurable. Tracking it quarter over quarter tells you whether the work on the open web is moving the engine’s understanding toward your defined position or away from it. How I score the drift is straightforward in principle: take the brand-defined version, compare it to what the engine produces, and measure the gap. Practitioners who know me will recognize the move.

Bottom of funnel, competitor, runs as a separate campaign at the trunk

Most practitioners count brand-versus-competitor as middle of funnel because comparison feels like research.

I count it as bottom of funnel, but run it as a separate campaign with a separate bucket because the buying moment is happening. The buyer is naming both brands and asking the engine to decide. I separate these queries because the measurement affects the brand-only reads when they’re mixed.

Three measurements run here:

Recommendation bias: Which brand the engine specifically picks.
Sentiment bias: Tone toward your brand against the competitor’s.
Accuracy against both 500-word brand-defined AI résumés: Your brand and the competitor’s, written from their perspective as if you were them.

Middle of funnel: The branches

Move one level up the tree and you land on the branches. The cohort is still your ideal customer profile (ICP), the intent is still the buying motion, but the brand isn’t mentioned in the query yet because the buyer is still researching. “Best red shirt for men” is a branch on Uniqlo’s XL men buying a red shirt tree.

We measure three KPIs:

Brand appearance: When the engine answers a research query, which brands surface in the recommendations, if any? Track yours and each competitor. The brands the engine reaches for at this layer are the brands it considers candidate answers to the research question, and the brands that don’t surface are the brands the engine has decided aren’t leading candidates. That decision was made against the corroboration available to the engine on the open web before the buyer ever asked.
Sentiment bias, normalized against appearance volume: Sentiment per appearance is the meaningful unit, not sentiment total. A brand that surfaces twice with neutral sentiment isn’t necessarily losing to a brand that surfaces 10 times with mixed sentiment because the comparison isn’t about volume of mention. It’s about the quality of mention per surfacing event. Raw totals get distorted by frequency in ways that misread the signal, and the normalization is what makes period-over-period comparison hold.
Accuracy drift against both narratives: A research-stage synthesis that matches your defined narrative is set up to carry the buyer toward the conversion at the bottom of the funnel with the right framing. One that is unclear, inaccurate, or incomplete is one the engine will repeat across its answers.

The ghost tax is the middle of funnel tax: the competitor recommended because you are absent, because the engine is biased to them, or their framing was better than yours. Those last two are vital — just counting the appearances doesn’t give a good measurement of the probability that ICP will reliably end up at your door.

Top of funnel: The twigs

At the top of every tree sit the twigs: topical questions the buyer asked before narrowing down to research the purchase or conversion. “Can men wear red shirts to work?” is a good example of a twig.

The diagnostic question at the twigs differs from that at the trunk and branches because the buyer isn’t asking about brands or even choices. The engine is reasoning at the topical layer, drawing on whatever content has earned recruitment for the topical question, and brand surfacing is rare and therefore not the primary indicator of success (you’d be measuring nothing most of the time).

Three measurements run on each twig.

Topical answer adoption, scored through corpus similarity: The engine’s answer compared against your content corpus and against each tracked competitor’s, with the brand whose corpus scores highest being the brand the engine has learned from. It’s the most novel measurement in the methodology and the one most likely to draw critical replies. TOFU attribution in the AI search is solvable by reading the engine’s output back against the candidate topical coverage.
Brand appearance: Brands that surface at the topical layer sit in a stronger competitive position than the ones that don’t. A brand that consistently surfaces at the awareness layer for a category is a brand the engine treats as topically authoritative with a level of ownership for that category, and topical authority is what underwrites recruitment further down the tree.
Competitor creep at the twigs: Which brands are surfacing at the twigs when yours isn’t, and what does the pattern tell you about whose content the engine has identified as topically authoritative?

Get the newsletter search marketers rely on.

The top and middle of the funnel have grown, not shrunk

AI has made research faster, and faster research means people do more of it. TOFU and MOFU volumes have grown, even as the share of the mix has rearranged underneath.

The three-layer model is now “visibility, influence, transaction.” The AI engines are the biggest influencers in the world, the website is where the transaction closes, and brands measuring AI visibility as a replacement for website traffic are measuring the wrong substitution.

The substitution is in the influence layer, and the transaction layer is doing better than it looks once you understand what’s influencing the new traffic and where it’s coming from.

The analytics layer closes the loop to revenue

The FQP measurement tells you where the engines are recommending you. Analytics tells you whether those recommendations convert. Closing the loop is the operational work, and it’s where the methodology earns its keep at the board level.

You build the AI-traffic cohort from referral signals and user-agent strings: Gemini, ChatGPT, Perplexity, AI Mode, and Copilot.

UTM tagging won’t help for inbound traffic from the assistive engines themselves because they don’t pass UTM parameters. So tag every source you do control, shrink the “Direct” bucket as far as it will go, and then identify the residual AI traffic through referrer signals, user-agent strings, and behavior patterns once the session lands.

The cohort you build is a sample you extrapolate from, small today and growing.

*From my Google Marketing Live 2026 SEA Labs keynote: Similarweb data on AI-referred session quality and conversion rates.*

Take the cohort’s conversion rate, average order value, time on site, and repeat purchase behavior. Apply it to the total recruitment volume the FQP measurement says you should be earning. That’s your revenue read.

AI-influenced visitors arrive with a perspective already formed — they had the brand summarized for them before they clicked — and they should convert more than organic. Track the AI-influenced cohort separately from the search cohort it’s mostly replacing.

At the analytics layer, you bring profit margin back into the picture. The engine doesn’t know your margin, so it optimizes for user satisfaction.

You know your margin, so you weight your “orchard” investment toward the trees (the cohort x intent intersections) where conversion volume x margin justifies the cultivation. That’s the organic equivalent of the cohort x intent x conversion rate x margin math that ads have run for 15 years.

Always remember that AI engine traffic will generally be more engaged, spend longer on your site, and convert better than search traffic. If it isn’t, that’s a “you” problem, not an engine problem.

Agential commerce is a measurement gain

Agents might look like the worst measurement environment yet: the user delegates, the agent decides, and the brand sees only the conversion. Everything between the question and the purchase is invisible.

The instinct is to grieve the human signals we’re losing: mouse movements, scroll depth, hesitation patterns, micro-pauses on the comparison page, and the back-and-forth between tabs that used to tell us so much about consideration. Those signals are gone in agent mode. What replaces them is a measurement surface humans never gave us in the first place.

Every interaction the agent has with your infrastructure is a programmatic event. It queries your product catalog, retrieves details, comes back for clarification, initiates a price negotiation, submits a mandate, and confirms the purchase.

That’s a conversion funnel you can track step by step, including the back-and-forth negotiation. As a programmatic user, the agent fires events through your MCP server, UCP endpoint, decoupled checkout, and mandate handling.

Every protocol layer you build for agential commerce is also a measurement layer, and the brands that build the infrastructure to transact with agents get the bonus of measuring the agent’s full reasoning chain in a way no one has ever been able to measure human reasoning.

For me, this is the most important measurement framework for the industry in the next phase. Search, assistive, and agential each land at the won gate, with three click types resolving the journey.

Search produces the imperfect click (the user picks from a list).
Assistive produces the perfect click (the AI gives one answer and the user confirms).
Agential produces the agentic click (the agent acts without the user seeing the candidates).

Each of the three modes offers its own measurement points, and the points aren’t equivalent.

Search is observable at the micro scale across the full journey.
Assistive is largely opaque at the micro scale and only surfaces sparse tactical signals: citation tracking, referrer patterns, user-agent strings, and behavioral cohort identification post-event.
Agential is observable at the programmatic scale, but only if the brand has built the protocol layer (MCP, UCP, decoupled checkout, and mandate handling) to capture the events.

The discipline is the same across all three modes. Harvest every tactical measurement point you can from every surface. Use those signals for tactical decisions because that’s what tactical micro signals are for.

Resist the temptation to make strategic decisions from any single mode’s tactical instruments because the picture each one produces is fragmented, partial, and structurally incomplete.

Strategic decisions remain bolted to the macro read on the funnel query pathway, aggregating across all three modes at the FQP level. The tactical instruments serve the strategy. They don’t replace it.

Macro measurement works on a slower timeline

For decades, we measured search the way the corner shop measures inventory: count what’s on the shelf this week, count it again next week, compare, and act. The instruments delivered the precision the environment supported, and you and your boardroom got trained through years of weekly dashboards to expect that exact shape of answer: a number this week against the same number last week, tracking work you can point at.

You’re not in the corner shop anymore. You’re operating inside an economy in its own right: seven assistive engines, the agents behind them, the apps each engine ships inside, the operating systems that surface them, the hardware in every pocket and on every face, every personalized context inside every walled garden, and the open web shifting under all of it, all running at once, all reshaping who gets recommended at the moment of decision.

Asking me for a precise monthly read on whether your brand is winning in that environment is asking the Bank of England for a precise monthly read on the loaf of bread you bought yesterday.

The Bank gives you inflation at 3% per month, on schedule, and the number is real, comparable across months, and defensible across years. But you can’t take 3% and apply it to your loaf because your loaf might have gone up 8%, and the loaf in the next shop might have gone up 1%. The 3% is the aggregate read on the system, not a measurement of any single transaction inside it.

That’s the discipline you’re moving to. I can give you a quarterly read on whether your brand is being recommended across the economy of engines, and the read will be comparable to last quarter and the quarter before, and projected against next quarter and the one after. The trend over time is what your strategy rests on.

What I can’t give you is a clean number for whether you won the Perplexity recommendation against your top competitor last Tuesday. That’s the loaf. The macro discipline gives you the inflation read. The loaf-level question doesn’t have a defensible answer in this environment, and the methodologies that pretend it does are selling you a false-precision number dressed up as the real thing.

See the complete picture of your search visibility.

Track, optimize, and win in Google and AI search from one platform.

Start Free Trial

Get started with

Semrush One Logo

Strategic clarity comes from quarterly trend data

This is the move you have to make, and the move you have to walk your boardroom through alongside you. You’re not measuring fewer things than you used to. You’re measuring something far bigger, and the instruments that fit the wider environment work on a slower timeline.

If you run the methodology month by month, the drift will swamp the signal. You’ll read noise and act on noise, and you’ll do that every month. If you run it quarter by quarter, you get one delta against one baseline, which still isn’t a trend. It’s two points and a line.

By the fourth quarter, you have three deltas, the noise comes down, and the trend reads through. By the eighth, the methodology has compounded into a read that your strategic decisions can actually rest on, with a real pathway of comparison going backward across two full years.

Quarter eight is also where most measurement programs die, because boardroom impatience peaks at exactly the point when the methodology produces its first defensible answer. Hold the line, and you compound the maturity.

Cave at month six, demand the weekly dashboard back, and you’ll spend the next several years hunting for precision the environment can’t deliver, while competitors who held the line walk past you with strategic clarity you used to have and gave up.

Make the case to your boardroom plainly:

We’re operating inside an economy, and your brand’s standing inside it determines whether AI puts you in front of the right buyer at the right moment.
The measurement discipline that fits this environment is the macro discipline economists developed for exactly the same kind of problem 100 years ago.

Move to macro measurement, accept the timescale, and the methodology compounds into the strategic clarity, the micro instruments stopped delivering the moment your buyer’s journey moved off your measurable surfaces and onto the engine’s.

The macro environment won’t give you a single, clear dashboard number. What it gives you, if you run this methodology with patience, is a quarter-by-quarter, mode-by-mode, engine-by-engine read of whether AI is recommending your brand at every stage of every buying journey the orchard is built around.

That’s the answer you can build a strategy around to gain a long-term competitive advantage.

This is the 15th piece in my AI authority series.

Part 1, “Rand Fishkin proved AI recommendations are inconsistent, here’s why and how to fix it,” introduced cascading confidence.
Part 2, “AAO: Why assistive agent optimization is the next evolution of SEO,” named the discipline.
Part 3, “The AI engine pipeline: 10 gates that decide whether you win the recommendation,” mapped the full pipeline.
Part 4, “The five infrastructure gates behind crawl, render, and index,” walked through the infrastructure phase.
Part 5, “5 competitive gates hidden inside ‘rank and display’,” covered the competitive phase.
Part 6, “The entity home: The page that shapes how search, AI, and users see your brand,” mapped the raw material.
Part 7, “The push layer returns: Why ‘publish and wait’ is half a strategy,” extended the entry model.
Part 8, “How AI decides what your content means and why it gets you wrong,” covered annotation, the last gate where you’re alone with the machine.
Part 9, “Why topical authority isn’t enough for AI search,” opened the competitive phase proper with topical ownership.
Part 10, “The funnel flip: Why AI forces a bottom-up acquisition strategy,” named the process.
Part 11, “The framing gap: Why AI can’t position your brand,” exposed the gap between evidence and recommendation.
Part 12, “The 10-gate AI search pipeline: Find where your content fails,” showed you how to find (and repair) your F grades in the AI engine pipeline.
Part 13, “The delegation boundary: How AI decides which brands win,” mapped how delegation moves between user and engine across Search, Assistive, and Agent modes.
Part 14, “The funnel query pathway: A framework for measuring AI visibility,” built the instrument this article runs measurement on.
Up next: Why SEO’s job now extends into post-sale operations.

Contributing authors are invited to create content for Search Engine Land and are chosen for their expertise and contribution to the search community. Our contributors work under the oversight of the editorial staff and contributions are checked for quality and relevance to our readers. Search Engine Land is owned by Semrush. Contributor was not asked to make any direct or indirect mentions of Semrush. The opinions they express are their own.