Rates without denominators are marketing, not measurement. This page is the methodology behind every stat PilotPM publishes — what counts, what doesn't, what window it was measured in, and how we prove an improvement is real.
In a pilot, every number below gets counted live on your own tickets.
A conversation counts as automatically resolved only when the AI answered it and the customer either confirmed the answer or asked nothing further. If a teammate steps in at any point — even one clarifying line — the conversation leaves the automated column. That's the strict end of the industry's definitions, on purpose.
The numerator is conversations the AI actually answered where the customer confirmed or had nothing further to ask — not “the bot said something and the ticket timed out.”
Any human involvement disqualifies the conversation — a reassignment, an internal edit that ships, a follow-up from a rep. No partial credit, full stop.
Measured this way our number reads lower than most published rates. We'd rather publish a smaller number that means something than a bigger one that doesn't.
90% of almost nothing is almost nothing. As an illustration: a system resolving 90% of the 3% of conversations it touches is automating far less than one resolving 55% of the 95% it handles — yet the first one quotes the bigger headline. So every resolution rate we show comes with the share of all inbound the AI was involved in, and neither number can hide behind the other.
Of the conversations the AI was involved in, the share it resolved end-to-end under the definition above. The flattering number, kept honest by its neighbor.
Of everything that arrived, the share the AI was involved in at all. The number that says whether the automation is load-bearing or a rounding error.
Published “resolution” definitions differ more than the published rates do. Below are three approaches from vendors' own public documentation — described mechanically, no editorializing. The same support week produces very different rates depending on the yardstick.
One published definition counts a conversation as resolved when the customer confirms the answer, or simply sends no follow-up for 24 hours after the AI's reply — and a teammate replying in the conversation does not void the assumed resolution. The definition's denominator has also been revised over time, most recently in July 2026 — so a published rate can move without the product changing. Source, as of July 2026 →
Another published approach counts a resolution when the requester confirms, or when the conversation sees no further unresolved activity for 72 hours and an LLM verification grades the answer as having addressed the question. Escalation to a human disqualifies the conversation. Source, as of July 2026 →
The strictest published definition counts a conversation only when no human was involved at any point, and additionally grades each interaction as relevant, accurate, and safe using AI evaluation. This is the end of the spectrum our own definition sits on. Source, as of July 2026 →
Reply drafting has its own ladder of honesty, and each rung is a different number. All three below are measured on production workspaces over the last 30 days, and each carries its window and sample in the dashboard.
The share of outbound replies that began as an AI draft. It says the AI is in the loop — not that it's right. Production, last 30 days.
Of the drafts a human actually reviewed, the share that shipped approved. Edits allowed — this measures “good enough to send,” not perfection. Production, last 30 days.
The share sent exactly as the AI wrote it, without a single edit — on email, currently 16–20%. It climbs every week, because the engine mines every edit your team makes.
Across everything that reaches our production workspaces — every channel, measured July 2026 — about 22% is currently handled end-to-end with no human touching it. This is the number behind the ~$0.45 effective cost per resolution on our pricing page.
Most of that 22% today is machine noise — delivery bounces, receipts, automated notifications — that the system classifies and auto-archives, not customer questions the AI answered. The customer-question share is smaller and growing.
Auto-archiving noise is real work a human no longer does, so it belongs in an “effective automation” number. But it's labeled: your dashboard splits archived noise from answered customers, so one never dresses up as the other.
A metric that includes the half-day in progress will happily invent a trend by lunchtime. Ours don't.
Every window ends at the last complete workday — this morning's quiet inbox is not a downturn, and a busy hour is not a spike.
Email, chat, and store reviews behave differently, so each channel carries its own trend line — no blended average that hides one channel degrading behind another improving.
Every improvement that ships pins a marker to the chart on the day it landed. When a line moves, the chart says why — you never reverse-engineer a trend from memory.
Every 6 hours, the improvement engine mines the edits your team made to AI drafts, works out what it keeps getting wrong, and proposes the fix — a reply rule, a KB correction, sometimes a code change. Nothing ships on vibes.
Every human edit to an AI draft is kept as ground truth and diagnosed to a root cause — knowledge, rules, data, or product.
Every proposal re-runs against a golden set of real past conversations. The number moves, or the change doesn't ship.
Changes launch to a 50% canary. Per-channel degradation detection auto-rolls-back anything that makes replies worse.
Every improvement lands behind human approval, and a change marker pins it to the chart on the day it shipped.
A conversation counts as automatically resolved only when the AI answered it and no human ever took over — the customer confirmed the answer or asked nothing further after the AI's reply. A human takeover at any point counts as a failure, full stop. Every resolution rate we show carries this definition, its time window, and its sample size.
Because a resolution rate without its denominator can hide almost anything: resolving 90% of the 3% of conversations an AI touches is less automation than resolving 55% of the 95% it handles. We always show what share of all inbound the AI was involved in alongside how much of that it resolved, so neither number can hide behind the other.
Mostly definitions. Some published definitions count 24 hours of customer silence as a resolution, or keep a conversation in the resolved column even when a teammate replied. Ours doesn't: any human involvement disqualifies the conversation. Measured on the strict end of the industry's yardsticks, the number is smaller — and it means something. Ask any vendor, including us, for the definition behind the number.
In a pilot we measure both numbers on your own tickets and show you exactly how they're counted — definitions first, then the rate.