Insights

Havruta: The Oldest Form of Rigour, Rebuilt for AI

A method for using AI in universities that strengthens thinking instead of replacing it.

Two people in close discussion over a shared text, the image of paired study.
Two people, one text, and the argument between them. The oldest way the university knew to make a mind defend itself.

Universities exist to protect thinking. It is the one thing they are for. So it is worth sitting with the strangeness of the past two years: handed the most powerful thinking aid ever built, much of higher education has responded by banning it, policing it with software that does not work, and so driving students to use it in secret, in precisely the way most likely to make them think less.

That is not a failure of students, and it is not a failure of the technology. It is a failure to ask the right question. The question was never whether students use AI. They do, and no policy will change that. The question is whether the machine makes them think before it answers. Everything turns on that one distinction, and it points to a method that is, oddly, the oldest one the university has.

On this page
  1. A method older than the essay
  2. The essay was always the vehicle
  3. Detection is not a strategy
  4. What the evidence actually says
  5. AI is not the enemy
  6. The fix: make the machine ask first
  7. And change what you assess
  8. What this is not
  9. Where to start
  10. The oldest form of rigour
  11. Frequently asked questions
01 · The lineage

A method older than the essay

Havruta takes its name from chavruta, the practice of paired study in which two people sit with a text and with each other and argue their way to understanding rather than receiving it. This is not a vague ideal. Studying real pairs in fine-grained detail, the educational scholar Orit Kent theorised havruta as three pairs of practices: listening and articulating, wondering and focusing, and supporting and challenging (Kent, 2010; Holzer and Kent, 2011). Strip away the tradition and you are left with a precise description of how two minds make one of them sharper.

That description is not foreign to the university. It is the university's own inheritance. Socratic dialogue is the same move. So is the medieval disputatio, the formal oral argument that was the backbone of university examination for centuries. So is the doctoral viva that still survives today. For most of its history the university tested whether you understood something by making you defend it out loud, in real time, against a questioner. The essay, the silent solitary document we are now so anxious about, is the recent arrival, and the anxiety is a sign that we leaned on it too heavily for too long.

Havruta does not import something foreign into the academy. It hands the academy back its own method, with the machine as the questioning partner.

02 · The vehicle

The essay was always the vehicle

We confused the vehicle for the destination. An essay was never valuable because of the finished document. It was valuable because of the journey to it: the reading, the wrestling, the structuring, the deciding what to cut. The artefact was simply proof that the thinking had happened.

Now the artefact is free. A student can produce a clean, referenced, well-organised essay in the time it takes to make a coffee, and it proves nothing, because the thinking it used to stand for may or may not have occurred. The document looks identical either way. The honest response to that is to change what we assess. The reflex instead was to reach for detection.

03 · Detection

Detection is not a strategy

It does not work, and we have known this almost since the detectors appeared. A systematic test of fourteen tools found them, in the authors' words, “neither accurate nor reliable”, with a built-in tendency to pass AI text off as human (Weber-Wulff et al., 2023). They are also biased: more than half of the essays written by non-native English speakers are wrongly flagged as machine-written, while native writing passes cleanly (Liang et al., 2023). And this is not a temporary engineering gap that a better detector will close. Work on the theoretical limits of detection shows that as language models improve, reliable detection becomes provably harder, and a light paraphrase defeats the detectors with almost no loss of quality (Sadasivan et al., 2023).

The simplest illustration is the most damning. Feed a detector a passage from an encyclopaedia written in the 1920s, decades before any of this existed, and it can report back, with confidence, that a machine wrote it.

This is why serious institutions have switched the detectors off. Vanderbilt University disabled Turnitin's AI detector and said so without hedging: “we do not believe that AI detection software is an effective tool that should be used” (Vanderbilt University, 2023).

Detection is not a strategy. It is the absence of one.

04 · The evidence

What the evidence actually says

Two claims, and both have to be made carefully, because this is a field where the headlines run far ahead of the data and an academic reader will notice.

The first claim is that using AI to avoid thinking costs you the thinking. The most solid evidence is a peer-reviewed study of more than three hundred knowledge workers, which found that the more a person trusted the AI, the less of their own critical judgement they brought to the task (Lee et al., 2025). It rests on self-report, so treat it as a strong signal rather than a final verdict, and be wary of the more dramatic “AI rots your brain” headlines, which so far come from small, unreviewed work. The careful claim does not need them, because it sits on a century of settled learning science. We remember less when we expect a machine to hold the information for us (Sparrow et al., 2011). We learn through effort, not ease: retrieving an answer outlasts re-reading it (Roediger and Karpicke, 2006); explaining our own reasoning to ourselves deepens understanding (Chi et al., 1989); the difficulties that feel unpleasant in the moment are often the ones that build durable learning (Bjork and Bjork, 2011); and students made to wrestle with a problem before being shown the method understand it better for the struggle (Kapur, 2008). Struggle is not a flaw in learning. It is the engine. Take it out and the learning goes with it.

The second claim is the one that turns a complaint into a method, and it comes from two studies that settle the argument when you read them together. In a large field experiment, school students were given an AI assistant for mathematics practice. The plain version, the one that simply gives answers, lifted their performance while they used it and then left them worse off on a later exam they sat without it (Bastani et al., 2025). They had leaned on a crutch and never built the muscle. A second version of the very same tool, redesigned to withhold the answer and make the student reason first, removed the harm. Separately, a randomised trial at Harvard gave students an AI tutor deliberately built to question rather than answer, and they learned more than twice as much, in less time, than in a strong, well-run active-learning class (Kestin et al., 2025).

Same technology. Opposite outcomes. The only variable is whether the machine makes the human think first.

Two caveats, because they make the argument honest rather than weaker. These are results about careful design, not about chatbots in general, and one excellent tutor in one course is a promising signal, not a settled science. The wider literature is sober about exactly how good tutoring gets: a careful review found well-built tutoring systems reach an effect size of about 0.76, close to the 0.79 of expert human tutors, and far short of the two standard deviations sometimes claimed for tutoring (VanLehn, 2011). The point is not that AI is magic. The point is that the same tool helps or harms depending on a single design decision, and across most of higher education we are making the wrong one.

05 · The tool

AI is not the enemy

None of this is an argument against AI. Used well it is remarkable. In a controlled trial, professionals given a capable assistant cut the time on writing tasks by roughly forty per cent and produced higher-quality work (Noy and Zhang, 2023). In a large workplace study, AI support raised productivity most for the least experienced workers, narrowing the gap between novice and expert (Brynjolfsson et al., 2025). The tool is not the problem. The way we have taught a generation to reach for it is.

It is also worth being honest about the backdrop, because some of the alarm predates AI entirely. Measured creative-thinking scores in the United States have fallen since around 1990, even as IQ rose (Kim, 2011), and measured intelligence in some populations peaked and went into reverse decades ago (Bratsberg and Rogeberg, 2018). AI did not cause those trends, and anyone who tells you it did is reaching past the evidence. But AI is the most powerful engine for offloading thought ever invented, and it has arrived in classrooms that were already finding it hard to teach thinking. That is the situation we have to work in.

06 · The fix

The fix: make the machine ask first

Havruta is one rule, held with discipline. Configure the machine to do what a good study partner does and a vending machine never will: ask before it answers.

In practice, the tool does not produce the essay or the analysis on demand. First it turns the questions back on the student. Say the task is an essay on whether raising the minimum wage reduces employment. A Havruta-configured assistant does not write it. It asks: what is your thesis, in one sentence? What is the strongest piece of evidence for it? What is the best argument against you, and why does it fail? Only once the student has answered, in their own words, does the machine help: tightening the structure, pressing on a weak claim, offering a counter-example to test the case. The student does the thinking. The machine makes the thinking harder, and better.

This is not a gentler AI. It is a more demanding one, and it is demanding in exactly the way the learning science says matters. It forces the student to retrieve, to generate, and to explain their reasoning before they can move on (Chi et al., 1989; Roediger and Karpicke, 2006). It puts the desirable difficulty back in, on purpose. And it works on the one material a machine cannot fabricate, because it is the one thing the machine does not have: the student's own reasoning. We call the moment the tool turns the questions back the Flip, the point at which the machine stops answering the human and starts questioning them.

07 · Assessment

And change what you assess

If understanding is built through dialogue, it can be evidenced through dialogue. This is the half of the answer the current debate keeps walking past.

The sector's own assessment scholars have argued for years that the way through is to redesign assessment, not to police it (Dawson, 2021), and the guidance bodies now say the same: adapt assessment, build students' ability to use AI well, and protect rigour by design (QAA, 2024; Russell Group, 2023). The instrument already exists, and the university has owned it for centuries. It is the oral examination, the viva, the structured conversation in which a student has to think in front of you. You do not need to prove whether a student used AI. You need to find out what they actually understand, and the fastest way to do that is to ask them, in person, and follow up. Interview the student about the work. Let AI help you analyse the depth of the reasoning in that conversation. Assess the thinking, not the artefact.

Here is the part nobody has assembled. AI that builds understanding by questioning, and assessment that evidences understanding by questioning, are the same idea pointed in two directions: the learning method and the marking method are one. Welding them into a single, coherent practice is what Havruta is for.

08 · Boundaries

What this is not

Because the space is crowded with adjacent ideas, it helps to be precise about the boundaries.

It is not “co-intelligence”, the popular notion that you should simply work alongside AI as a clever colleague. That is a posture, and a good one, but it is not a method. Havruta is a discipline with a hard rule: the machine questions first.

It is not an AI examiner that grills a student after they hand work in. That questions after the fact, to catch them out. Havruta questions before, as the price of producing anything at all.

It is not a generic Socratic chatbot that gently leads a student toward an answer the machine already holds. Havruta does the opposite. It extracts what the machine cannot hold: the student's own reasoning, which exists nowhere else.

And it is not detection by another name. It does not police a finished file. It changes what gets produced, and how it is judged.

09 · Where to start

Where to start

If you teach, you can begin this week, without a budget or a committee. Give one assignment a partner brief that forces the student to state their thesis, their evidence, and the strongest objection before the machine writes anything. Ask for the chat transcript alongside the essay, so you can see the thinking and not only the output. Replace one piece of written assessment with a ten-minute structured conversation, and notice how quickly you can tell understanding from performance. You will learn more about a student in that conversation than in a stack of immaculate, identical essays.

If you lead, resist the instinct to write a policy first. A policy tells people what not to do; it does not change what the tool does when a student opens it alone at midnight. Decide what you actually want AI to do for a learner at your institution, configure the tools you provide to ask first, and rebuild a slice of assessment around dialogue. Treat the detection budget as the sunk cost it is, and move the money to redesign. The institutions that come through the next decade strongest will not be the ones with the strictest ban or the cleverest detector. They will be the ones whose graduates can still think, because someone insisted the machine make them.

10 · The oldest form of rigour

The oldest form of rigour

There is nothing nostalgic in any of this. For most of its history the university examined people by making them think out loud, in dialogue, under pressure, and defend what they said. We replaced that with the silent, solitary, easily gamed document, and now the document can be generated in seconds, and we are surprised it tells us nothing.

The way forward is not to detect the machine. It is to put the dialogue back, with the machine on the other side of the table, asking the questions. That is the oldest form of rigour, rebuilt for AI.

11 · Frequently asked

Frequently asked questions

Is using AI to write an essay cheating?

It is the wrong question. The essay was always a vehicle for thinking, not the destination, and a student can now generate one in seconds. Banning or detecting AI does not restore the thinking; it just pushes students to use AI in secret, in the way most likely to weaken their minds. The better question is whether the AI made the student think before it answered. Configure it to ask first, and assess the reasoning rather than the document.

Does AI harm critical thinking?

Used as an answer-machine, it can. A peer-reviewed study of knowledge workers found that the more people trusted AI, the less critical judgement they applied (Lee et al., 2025), and decades of learning science show that effort, not ease, is what builds durable understanding. But the harm depends on how the tool is used: AI designed to question and withhold answers can improve learning rather than erode it. The design choice is ours.

Do AI detectors work?

No. Systematic testing found detection tools “neither accurate nor reliable” (Weber-Wulff et al., 2023), and they wrongly flag the majority of non-native English writers as machine-authored (Liang et al., 2023). Reliable detection is also provably getting harder as models improve (Sadasivan et al., 2023). Universities such as Vanderbilt have disabled their detectors for these reasons. Detection is a dead end; assessment redesign is the answer.

How should universities assess students when everyone uses AI?

By evidencing understanding through dialogue rather than policing a document. The instrument already exists: the oral examination, the viva, the structured conversation in which a student has to think in front of you. Interview the student about their work and assess the reasoning, with AI helping to analyse the depth of that conversation. This is the position of the sector's assessment scholars and guidance bodies (Dawson, 2021; QAA, 2024).

What is the Havruta approach to AI in education?

Havruta configures AI to ask before it answers: it makes the learner supply their reasoning, evidence, and intent before it produces anything, then sharpens that thinking rather than replacing it. It pairs this with dialogue-based assessment, so understanding is both built and evidenced through questioning. The name comes from chavruta, the centuries-old practice of paired study, and the principle runs through Socratic dialogue, the medieval disputation, and the doctoral viva.

If you want to bring this into your institution, request a strategic briefing.