System · Methodology

The epistemology of Plumb.

Publishers cannot get inside the LLM. They can only measure what the LLM does to their edge. Every metric in this product carries a confidence badge so you know exactly which lens you’re looking through.

MEASUREDSAMPLEDINFERRED

00Who this product is for

Built for publishers where articles carry real economics.

Plumb’s commissioning workflow generates traceable ROI only when one well-placed article is worth meaningful money in the publisher’s business model. The methodology is built around two publisher archetypes:

Affiliate publishers — finance, commerce, insurance

NerdWallet-shape publishers where each well-ranked article earns $200–$2,000/month in recurring affiliate commissions. A refreshed cluster on “how to refinance student loans” that earns 35% Perplexity citation share generates ~150 AI-referred sessions/month at 5.1% affiliate conversion × $40/conversion = ~$306/month, ~$4,000/year recurring. Multiply across 20–50 such decisions per year. The ROI closes clearly.

B2B vertical-trade publishers — SaaS, legal, healthcare, finance

STAT News-shape and Industry Dive-shape publishers where newsletter LTV is $200–$500/subscriber. One authoritative explainer on an emerging cluster (“how does the EU AI Act affect SaaS companies”) that earns 40% Perplexity citation share generates ~600 AI-referred B2B sessions/month at 8% newsletter conversion × $300 LTV = $172,800/year from one piece. Payback under one month.

For general programmatic and regional news publishers, the per-article unit economics do not support the commissioning workflow. Plumb is a vitamin for those publishers — useful for agentic traffic detection and CPM protection — not a revenue-restoration tool. We are explicit about this distinction rather than pitching universally.

01Premise

You cannot look inside an LLM.

The conventional analytics stack assumes a funnel you can trace end-to-end: request arrives, referrer is known, user intent is legible from the URL. AI answer surfaces break every link in that chain. Retrieval is opaque. Prompts are private. Attribution leaks into “Direct.” Citations come and go hour-to-hour.

Plumb does not pretend otherwise. It measures what a publisher can measure from inside its own stack, samples what it can sample from the outside, and labels the difference. When you see a number, you can see, at a glance, how it was produced.

02First-party

What we measure.

MEASURED

100% coverage. These signals come from your own infrastructure and do not depend on any probe of a third-party model.

Cloudflare bot crawl logs
Every GET request tagged by Cloudflare's bot-verified list — GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Applebot, and the long tail. Plumb joins user-agent strings to a classification table so every crawl is attributable to a named platform. Coverage is 100% of traffic touching the edge.
GA4 referral sessions
Sessions with source = chatgpt.com / claude.ai / perplexity.ai / gemini.google.com / copilot.microsoft.com / other AI surfaces. The referral classifier is maintained against a living list because user agents and referrer headers from AI surfaces change more often than search engines do.
GSC query + impression data with AIO flagging
Google Search Console impressions, clicks, and rank, with an additional flag for queries where an AI Overview (AIO) was rendered. This is how we detect the search-cannibalization signal — impressions holding up while clicks collapse — without needing to scrape SERPs.
robots.txt + llms.txt diffs
Versioned snapshots of our site-level permission files and week-over-week diffs. A policy change is a news event; we log it and correlate downstream with crawler behavior change to sanity-check whether the bots are listening.

03Probe

What we sample.

SAMPLED

High variance. These signals come from querying third-party models on a weekly schedule and recording what comes back.

Weekly citation probe panel
500 queries across topic clusters relevant to affiliate (finance, insurance, commerce) and B2B vertical-trade (SaaS, legal, healthcare) publishers, run against Perplexity, ChatGPT, Gemini, and Claude every Monday at 06:00 UTC. We log which domains are cited, in what order, in what excerpt. This produces competitive citation share data.
Known non-determinism
Same prompt, same model, temperature zero, same week — citation sets differ hour-to-hour. Week-to-week swings under 15 percentage points are inside the noise floor. We display trend sparklines, not single-point claims. For affiliate publishers tracking a finance cluster, this means a 5pp citation share swing in one week is noise; a 20pp swing over four weeks is signal.
What we use it for
Competitive citation share on topic clusters. Head-to-head matchups: NerdWallet vs. Bankrate vs. Forbes Advisor on a query cluster, or STAT News vs. BioPharma Dive vs. Endpoints on a healthcare vertical cluster. Emerging-query detection — finding clusters where citation voices are thin and the field hasn't filled in. Never for absolute AI visibility claims — only for directional competitive signal on the queries that drive affiliate conversions or B2B newsletter signups.

04Probabilistic

What we infer.

INFERRED

Model-derived. These signals come from Bayesian bounds, heuristic scoring, or diagnostic classifiers — treat as estimates under assumptions, not observations.

Scenario Explorer — hidden AI share of Direct
Bayesian posterior over a latent variable: the fraction of 'Direct' sessions that began with an AI conversation. Prior is survey-calibrated (three publisher surveys, Q4 2025). Evidence is the conversion-rate lift of Direct vs. Organic. Output is a credible interval, not a point estimate. Sensitivity is visible in the sweep chart — the number moves with the prior, as it should.
AI-Readiness heuristic scoring
Composite score over seven factors (webutation, search position, freshness, structure, metadata, expertise, performance). Weights are tuned against a historical panel of intervention outcomes. Useful as a work-order generator; not a ground-truth measure of crawlability.
Ignored-story diagnostics
Classifier that labels why a breakout-candidate story failed to resonate in AI surfaces: schema, timing, paywall, topic, stale, duplicate. Labels are probabilistic guesses from a small decision tree. Use them as hypotheses, not conclusions.

05Unknowable

What we refuse to fabricate.

There are questions the product is structurally unable to answer. Every vendor claiming otherwise is extrapolating.

LLM internal state
What the model 'thinks' about your content. What weights it assigns. How retrieval scored your domain. These are closed-source artifacts of private infrastructure. No outside-in tool can observe them.
What users ask
The queries hitting ChatGPT, Claude, Gemini, or Perplexity are not logged to publishers. We can sample a synthetic panel (see §03) but we cannot know the long tail of what real users are typing. Any vendor showing 'real user queries' is either sampling or lying.
How content is used in training
Whether your article ended up in a training set, a RAG index, or an evaluation suite. Crawl logs tell you a bot visited; they say nothing about downstream use. Statements about training-set inclusion are speculation.
Retrieval decisions
When a user asks about 'Fed rate cuts,' why did the model cite you rather than Reuters? We can correlate crawl concentration with citation outcomes, but we cannot observe the scoring function. The link is suggestive, not explanatory.
Answer-surface rendering
Outside-in scraping — using Playwright to render ChatGPT or Perplexity and extract citations — does not work at scale. Playwright is fingerprinted and rate-limited. Terms-of-service exposure is real. Signal decays faster than you can sample. Every serious attempt has failed; pretending otherwise creates false numbers.

05.5The panel

How we aggregate across publishers.

PANEL

When Plumb is deployed across multiple publishers, the instance can compute a cross-tenant aggregate — a first-party panel benchmark — alongside the published research literature.

Minimum N = 3 tenants
No aggregate is published below three participating publishers. At N=0-2 the dashboard shows the static research-literature figure (Adobe, Moz, GEO Research, Position Digital, Enrichlabs) and labels it as such. At N≥3 the panel median replaces the primary reference and the research figure becomes the anchor in the tooltip.
Distribution statistics only
Responses carry median, p25, and p75 — no min, no max, no per-tenant rows. Each caller's own tenant contributes to the aggregate but is indistinguishable from the others. Quantiles are the only exposure; the underlying data never leaves the panel.
Refresh cadence
Panel aggregates are computed on demand with a five-minute in-process cache. Switching a panel participant in or out takes effect on the next cache expiry. There is no historical panel archive — each read is a live computation.
What the panel is not
It is not a benchmark purchased from a vendor, not a survey, not a scraped corpus. It is the internal measurement of publishers who have instrumented Plumb. Its reach is the reach of the product. Anywhere we cite a figure from outside this panel, the source is named in the tooltip.

06Sovereign intelligence

Why Plumb, not your marketplace's dashboard.

The AI-content tooling landscape has sorted itself into three layers. Every other tool in the stack belongs to one of the first two. Plumb is the third — the one the publisher owns.

Layer A

financial stake in the outcome

Monetization marketplaces

TollBit, ScalePost, ProRata, miso.ai, Particle, Dappier. Brokers who sit in the middle of payment flow and take 15–25% of monetized revenue. Their dashboard shows you their numbers.

Layer B

in the request path

Edge infrastructure

Cloudflare Pay-Per-Crawl, DataDome, HUMAN Security, Akamai, Imperva. Enforcement at the edge. Their dashboard shows you their enforcement outcomes.

Layer C · Plumb

no payment take-rate, no edge-path dependency

Sovereign intelligence

The publisher’s own measurement and decision layer. First-party crawl telemetry, retrieval-vs-training classification, confidence- graded metrics, audit-grade evidence export. Independent of every payment rail in the stack, so the numbers reconcile.

A publisher using both TollBit and Cloudflare Pay-Per-Crawl already has two vendor dashboards. Each has a financial interest in framing the picture favorably. Plumb is what verifies either of them.

07Stance

Honest instrumentation, or nothing.

Plumb refuses to fabricate visibility it cannot actually deliver. Every metric in this product carries a confidence badge — MEASURED for first-party, SAMPLED for weekly probes, INFERREDfor model outputs — so the reader of a chart always knows which lens they are looking through.

The audience for this dashboard is a revenue leader, a product owner, a Finance partner, a board. They need to defend their numbers. An over-claimed visibility metric will be shredded in the first follow-up question. A properly-labeled one, even when the uncertainty is large, survives scrutiny.

Decisions are durable too. Every triage call on a bleeding query — accept, dismiss, snooze — is persisted, as is every commission draft generated from one. The chain from measurement to decision to editorial action is a record, not a screenshot.

The panel (§05.5) extends the same principle to benchmarking. When three or more publishers run Plumb, the instance can surface a first-party cross-tenant median instead of citing another vendor’s research PDF. Aggregate only, quantiles only, revoked on exit.

Anything in this category that claims to “show you what ChatGPT thinks about your content” is lying. Plumb refuses to be that product.

08Scope

Why we explicitly do not do certain things.

Brand sentiment monitoring

Tracking how AI platforms “feel about” your brand — sentiment scores, tone analysis, share-of-voice in AI answers for brand queries — is Profound’s market. The buyer is a CMO. The product requires a real-user conversation panel (Profound has one via Datos/Semrush; we do not). We are not competing in that segment, and we are not building synthetic substitutes that would mislead a publisher into treating sampled brand-sentiment data as ground truth.

Content generation

Plumb generates commissioning briefs — structured specifications for what a human writer should produce. It does not generate the article itself. AI-generated content at scale produces the same low-citability, low-structural-quality output that AI answer surfaces are already displacing. Helping publishers generate more of that content is not a service Plumb will provide. The commissioning brief tells the writer what structural factors to hit (150-word answer block, Q&A headings, statistics in the intro); it does not write for them.

General programmatic display analytics

Plumb does not optimize CPM yield, viewability, or ad-stack configuration for high-volume programmatic publishers. For publishers where the marginal revenue per article is $0.001–$0.01 in display impressions, the commissioning workflow does not generate recoverable ROI. Chartbeat and Parse.ly serve that segment well. Plumb serves affiliate and B2B vertical-trade publishers where article-level unit economics are an order of magnitude higher.