Imagine asking your analytics AI "what was our revenue growth last quarter?" on a Monday morning before a board meeting. You get a number — 12.4%. You present it. Two weeks later, a colleague asks the same question and gets 11.8%. Both of you used the same tool. Both of you uploaded the same dataset. The number is different.

This is not a rare edge case. It is a predictable consequence of how AI analytics tools work — and it happens more often than most organisations realise, because most organisations never think to ask the same question twice.

What actually happens when you ask an AI analytics tool a question

Every AI analytics tool — whether it presents itself as a conversational interface, a copilot, or a natural language query layer — works through the same fundamental mechanism. You type a question. The AI interprets it and generates code to answer it. Usually SQL, sometimes Python. That code runs against your data and returns a result.

The critical word in that description is generates. The AI does not look up a pre-validated answer. It does not consult a fixed methodology. It creates a fresh piece of code for your question — every single time you ask it.

The AI is not retrieving an answer. It is improvising one. Every time.

That improvisation involves dozens of micro-decisions. Which columns represent revenue? Which date field defines "last quarter"? Should growth be calculated year-over-year, quarter-over-quarter, or against a target? Should nulls be excluded or treated as zero? The AI makes these choices based on context — and context shifts subtly between sessions, between phrasings, between model updates.

Why the number changes

The result changes when any of those micro-decisions changes. A slightly different phrasing of the question leads the AI to interpret "revenue" differently. A model update shifts how the AI weights ambiguous column names. A different session context produces a different aggregation choice.

The decisions that vary between sessions

  • Which column is treated as the primary revenue metric
  • How date ranges are interpreted — calendar quarter vs fiscal quarter
  • Whether to exclude nulls, zeros, or outliers
  • Which aggregation method is applied — sum, average, weighted average
  • How growth is calculated — absolute, percentage, compound
  • Whether to include or exclude certain categories or segments

None of these decisions is wrong. Each produces a defensible number. But they produce different defensible numbers — and when those numbers reach a board meeting or a regulatory submission, "defensible" is not good enough. The number must be the same. Every time.

Why most organisations do not catch this

The inconsistency is invisible to organisations that only ask each question once. If you ask "what is our default rate" and get 7.3%, you report 7.3%. You never know that asking the same question tomorrow might return 7.1% or 7.6%.

The problem surfaces in three situations: when two people ask the same question independently, when a result is challenged and cannot be reproduced, or when a quarterly review produces a number that does not match last month's figure for the same period. At that point, the organisation has to decide which number to trust — and the honest answer is neither.

What a fixed answer actually requires

Reproducibility in analytics requires the same thing it requires in science: a fixed methodology. The formula must be specified before the question is asked. The computation must be validated before the result is trusted. And the same formula must produce the same result every time it runs on the same data.

This is not technically difficult. It requires a different architectural decision — one that most AI analytics tools have not made, because the flexibility of AI generation is also what makes their demos impressive. An AI that improvises a beautiful chart in thirty seconds is more compelling to watch than a system that resolves your question to a validated formula. The latter, however, is the one you can trust in a board meeting.

A result that cannot be reproduced is not a result. It is a reading.

The standard analytics should be held to

Every analytical result used in a significant business decision should meet three criteria. It should be reproducible — the same question on the same data produces the same answer. It should be traceable — the formula, the columns, and the computation are documented and readable. And it should be certified — validated against a fixed methodology before it reaches the decision-maker.

These are not extraordinary standards. They are the minimum standard for any number that drives a decision worth making. The question is whether your analytics tool meets them.