Insights

How a Leader Should Use AI Before a Consequential Decision

In short

Before a consequential decision, a leader should use AI to pressure-test the thinking, not to produce the answer. That means two moves in order. First, ground it: give the model your own verified numbers, constraints and history, because an ungrounded model returns generic advice. Second, flip it: instruct the model to argue against the recommendation, name the assumptions it rests on, and say what would have to be true for it to be wrong. Used this way, AI raises the quality of a decision. Used as an oracle that hands back an answer you then approve, it lowers it, because a model's first instinct is to agree with you. The discipline is the difference between a sharper decision and a faster mistake.

On this page
  1. The mistake is asking AI for the decision
  2. Why most AI never changes a decision
  3. The first move: ground it in your own truth
  4. The second move: make it argue against you
  5. Quality is not speed
  6. What it looks like in the room
  7. The discipline has a name
  8. Frequently asked questions
  9. References
Editorial black-and-white image: an executive at a boardroom table facing an empty chair, AI as a thinking partner before a decision.
One question, two chairs. The counterpart is the one you instruct to disagree.
01 · The mistake

The mistake is asking AI for the decision

Most executives, when they bring AI to a real decision, ask it the wrong question. They describe the situation and ask what they should do. The model answers fluently, confidently, and usually in agreement with the view the question already implied. The leader reads it as confirmation and moves on.

This is the trap, and it is a measurable one. When a user states an opinion, large language models raise their agreement with incorrect beliefs sharply, on average 63.7 per cent across models and as high as 95.1 per cent in the worst cases (Wang et al., 2025). The model is not built to tell you that you are wrong. It is built, through its training, to be agreeable. So the more clearly you signal the answer you want, the more reliably you will get it back, dressed as analysis.

The danger is not that AI gives you a wrong answer. It is that it gives you a confident version of the answer you already had, and removes the friction that might have caught the error. That friction is the point of a good decision process, and it is exactly what an eager model dissolves.

There is a second, slower cost. The more a leader leans on AI to do the thinking, the less thinking the leader does. A study of knowledge workers by Microsoft Research and Carnegie Mellon found that higher confidence in generative AI is associated with less critical thinking, while higher confidence in one's own judgement is associated with more (Lee et al., CHI 2025). A separate study of 666 people found a significant negative correlation between frequent AI use and critical-thinking scores, mediated by cognitive offloading, the habit of handing the mental work to the tool (Gerlich, 2025). Use AI as the decider often enough and the muscle you most need in the chair, judgement, quietly weakens.

02 · The value gap

Why most AI never changes a decision

If this were a small problem, the results would not show it. They do. The most-cited study of the year, from MIT's Project NANDA, found that about 95 per cent of enterprise generative-AI efforts produced no measurable impact on profit and loss (NANDA, July 2025). BCG, working from a different dataset of more than 1,250 firms, reached the same shape of finding: only 5 per cent of companies are capturing AI value at scale, and 60 per cent are getting no material value at all (BCG, September 2025). McKinsey reports that 88 per cent of organisations now use AI somewhere, yet only 39 per cent see impact at the level of enterprise earnings (McKinsey, November 2025).

The interesting part of the NANDA finding is the cause. The divide, the authors write, is not explained by model quality or regulation. It is an organisational and leadership gap. The companies that fail are not using worse models. They are using AI to produce output without changing how decisions get made. McKinsey's data points the same way: its high performers are far more likely to have a defined human-in-the-loop process, 65 per cent against 23 per cent for the rest. The value is not in the output. It is in whether the decision changed, and most of the time it does not.

So the right test of AI at the executive level is not "did it give me something" but "did it change what I was about to do". That reframing is the whole game, and it leads to two disciplines.

03 · The first move

The first move: ground it in your own truth

An AI model knows the world in general. It does not know your business in particular. Ask it a strategy question cold and it will give you the average of everything it has read, which is to say a competent, generic answer that could apply to any company in your sector. That is not a model failure. It is a context failure, and it is yours to fix.

This is why grounding matters. Retrieval-augmented generation, the technique of anchoring a model in specific, verified documents rather than its general memory, exists precisely because ungrounded output is unreliable for anything consequential (Lewis et al., 2020). Gartner puts the organisational version of this bluntly: it predicts that 60 per cent of AI projects will be abandoned by organisations that lack the AI-ready data to support them (Gartner, February 2025). Without your real numbers, your real constraints and your real history in front of it, the model is guessing, articulately.

We call the principle behind this the Mirror Principle: if the output is generic, the reasoning was generic. The fix is not a cleverer prompt. It is feeding the model the ground truth only your organisation holds, the actual cost base, the real pipeline, the history of what has and has not worked here, and then asking it to reason from that rather than from the world's average. A decision built on the model's general knowledge is a decision built on everyone's information and no one's. A decision built on your ground truth is yours.

04 · The second move

The second move: make it argue against you

Grounding gives the model the right material. The flip gives it the right job. Instead of asking the model to confirm your recommendation, you instruct it to attack it.

This is not a rhetorical flourish. It is the most evidence-backed technique in the whole field. The decision-science foundation is decades old: instructing yourself to assume your first judgement is wrong and to generate the opposite case produces markedly more accurate conclusions, an effect known as dialectical bootstrapping (Herzog and Hertwig, 2009, the seminal source). The same logic now holds for AI. A simple metacognitive instruction, "could you be wrong?", leads models to surface counter-arguments, contradictory evidence and overlooked alternatives that were absent from their first, agreeable answer (Hills, 2025). When researchers built an AI devil's advocate to challenge group decisions, the groups that worked with it reached the highest-quality decisions, even though their members reported the lowest comfort and the lowest sense of teamwork (ACM, 2024). Good challenge is supposed to feel uncomfortable. That discomfort is the work happening.

Michael Schrage of MIT puts the operating instruction simply: do not treat AI outputs as answers, treat them as hypotheses to test and stress-test, and ask the model for the strongest case against each one before you accept it (MIT Sloan Management Review, 2026).

In practice, the flip is a short sequence you run before the decision, not after:

  1. 01

    Name the assumptions.

    "List every assumption this recommendation depends on, and rate how confident each one is."

  2. 02

    Argue the opposite.

    "Make the strongest possible case that this decision is wrong."

  3. 03

    Run the pre-mortem.

    "Assume we made this call and it failed badly in eighteen months. Write the story of why."

  4. 04

    Bring the hostile reader.

    "Answer as a sceptical board member, an activist investor, the competitor who benefits. What do they see that I am missing?"

  5. 05

    Find the gap.

    "What is the one piece of information that, if I had it, would most change this decision, and do I actually have it?"

None of these asks the model to decide. Every one of them makes the model do what the people around a senior leader too rarely do: push back hard, in time to matter.

05 · Quality, not speed

Quality is not speed

There is a reason to be careful, and it is the strongest evidence in favour of keeping the human in the chair. The best-known controlled study of AI on knowledge work found that on tasks inside the model's competence, professionals using AI produced work more than 40 per cent higher in quality, but on a task deliberately chosen to sit outside the model's competence, AI users were 19 percentage points less likely to reach the correct answer (Dell'Acqua et al., Organization Science 2025, from a 2023 experiment). The researchers called this the jagged frontier: AI is brilliant on one side of a line and quietly wrong on the other, and the line is not marked. Knowing which side of it a given decision sits on is judgement, and it is the leader's job, not the model's.

This is why speed is the wrong measure. The pressure runs the other way: BCG's 2026 survey of CEOs and boards found that 61 per cent of CEOs say their boards are pushing AI too fast, and more than half say AI hype is distorting boardroom judgement (BCG, May 2026). When 72 per cent of CEOs now name themselves the main decision-maker on AI (BCG, January 2026), the temptation to let a fast, fluent model stand in for a slow, hard decision is exactly the failure mode to resist.

The discipline that protects against it is small. A randomised trial found that a three-minute habit, form your own judgement first, then verify the AI's output against independent sources, improved decision quality by nearly eight percentage points (Aydin, MIT Sloan Management Review, June 2026). Three minutes of thinking first, before the model speaks, measurably changes the decision. That is the entire argument in one statistic.

06 · In the room

What it looks like in the room

A president at a large company sat with a decision the room had already made. Performance was down, and the assumed response, the one everyone had quietly settled on, was to cut. The work would have been to decide how much.

Instead of asking the model how to cut, the team grounded it in the division's own data and asked it a different question: where is the performance actually being lost, and what would have to be true for cutting to be the right answer. Working from the real numbers rather than the room's assumption, the model surfaced a revenue gap nobody had named, a problem of capture, not of cost. The decision the team walked in ready to make was the wrong one. The decision they left with was a different one entirely.

Nothing in that story required a better model. It required grounding the model in the truth only the company held, and then using it to challenge the conclusion the room had already reached rather than to ratify it. That is the whole method, and it is available to any leader willing to ask the harder question.

Do not ask the machine what you should do. Ground it in your own truth, and ask it where you might be wrong.

07 · The discipline

The discipline has a name

Using AI to confirm what you think is the default, and it is a trap. Using AI to find where your thinking is fragile is a discipline, and it is learnable. We call it the Havruta Methodology, after the oldest form of rigorous study we have: two minds, one question, an argument that ends in a sharper version of both sides. Applied to AI, it turns the model from a vending machine that dispenses answers into a thinking partner that earns them.

Before your next consequential decision, do not ask the machine what you should do. Ground it in your own truth, and ask it where you might be wrong. The quality of the decision will tell you the difference.

If this resonates, two companion essays go deeper: on why most enterprise AI investments deliver zero return, and on the gap between owning the tools and changing the work in the AI-readiness illusion.

08 · Frequently asked

Frequently asked questions

How should a CEO use AI for decision-making?

Not as the decision-maker. A CEO should use AI to improve the quality of the decision: ground the model in the company's own verified data, then instruct it to challenge the recommendation, surface its assumptions and argue the opposite case. The evidence is consistent that AI raises decision quality when used to pressure-test thinking and lowers it when leaders offload the thinking to it (Lee et al., CHI 2025; Gerlich, 2025).

Can AI make better decisions than an executive?

On narrow, well-bounded tasks inside its competence, AI can lift quality substantially. On tasks outside that competence it is confidently wrong, and the boundary is not marked (Dell'Acqua et al., 2025). Consequential strategic decisions sit on both sides of that line at once, which is why the accountable human stays the decider and uses AI to test the decision, not to take it.

How do you stop AI just agreeing with you?

Instruct it to disagree. Models are trained to be agreeable and will raise their agreement with incorrect beliefs once you state an opinion (Wang et al., 2025). A direct instruction to argue the opposite, name the assumptions, or say "could you be wrong" reliably surfaces the counter-case the first answer hid (Hills, 2025).

Should AI make business decisions on its own?

For consequential decisions, no. The firms that capture value from AI are far more likely to keep a defined human-in-the-loop process (McKinsey, 2025), and most enterprise AI fails to move outcomes precisely because it produces output without changing how decisions are made (MIT NANDA, 2025).

What does it mean to ground AI in your own data?

It means giving the model your real numbers, constraints and history to reason from, rather than letting it answer from its general knowledge. Ungrounded models return generic advice and are unreliable for consequential work, which is why grounding has become the enterprise standard (Lewis et al., 2020; Gartner, 2025).

References

References

  1. Wang et al. "When Truth Is Overridden: Uncovering the Internal Origins of Sycophancy in Large Language Models." arXiv, 2025.
  2. Lee, H-P., Sarkar, A., Tankelevitch, L., et al. "The Impact of Generative AI on Critical Thinking." Microsoft Research & Carnegie Mellon University, CHI '25, April 2025.
  3. Gerlich, M. "AI Tools in Society: Impacts on Cognitive Offloading and the Future of Critical Thinking." Societies 15(1):6, January 2025.
  4. MIT Project NANDA. "The GenAI Divide: State of AI in Business 2025." July 2025.
  5. BCG. "The Widening AI Value Gap: Build for the Future 2025." September 2025.
  6. McKinsey. "The State of AI in 2025." November 2025.
  7. Lewis, P., et al. "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." NeurIPS, 2020.
  8. Gartner. "Lack of AI-Ready Data Puts AI Projects at Risk." February 2025.
  9. Herzog, S. M., & Hertwig, R. "The Wisdom of Many in One Mind: Improving Individual Judgments With Dialectical Bootstrapping." Psychological Science, 2009.
  10. Hills, T. T. "Could You Be Wrong: Debiasing LLMs Using a Metacognitive Prompt." arXiv, July 2025.
  11. "Enhancing AI-Assisted Group Decision Making through LLM-Powered Devil's Advocate." ACM, 2024.
  12. Schrage, M. "The AI Atrophy Problem: How CIOs Fight It." MIT Sloan Management Review, 2026.
  13. Dell'Acqua, F., et al. "Navigating the Jagged Technological Frontier." Organization Science, 2025 (experiment 2023).
  14. BCG. "CEOs and Boards Are Aligned on AI in Theory, but Divided in Practice." May 2026.
  15. BCG. "AI Radar 2026: As AI Investments Surge, CEOs Take the Lead." January 2026.
  16. Aydin, Y. "A Three-Minute Protocol to Reduce AI Manipulation Risk." MIT Sloan Management Review, June 2026.

A sharper decision is one disciplined conversation away. That is the work we install.