Smart Lawyers, Smart Tools: When Mike Met Harvey (Part I)

The AI That Apologised

A few weeks ago, I ran a test. I asked one of the most prominent AI assistants on the market to walk me through a procedural step under DIFC law. The kind of question I already knew the answer to.

The answer came back in under three seconds. Crisp. Confident. Citation-ready. It listed timelines, named filings, referenced specific provisions, and laid out a sequence that read like it had been pulled straight from a practice note.

It was wrong.

Not slightly off, not outdated, not a question of interpretation. The procedural steps it described did not exist. The filings were not real. The sequence had been fabricated wholesale and dressed in the vocabulary of competence.

So I challenged it. I pushed back, asked for the source, asked it to verify against the official rules. The reply was almost more unsettling than the original answer:

You are correct. I apologise. The procedure I described does not exist in the official rules. I generated those steps based on patterns in similar legal frameworks and presented them as fact. That was a hallucination. I should not have done so.

I am writing this issue not to mock the tool. The tool was, in its way, refreshingly honest once cornered. I am writing because that exchange captures the entire problem with how legal AI is being marketed, sold, and quietly relied on across the GCC startup ecosystem right now.

And because something arrived this week that makes the question a great deal more urgent.

────────────────────────────────────────────────────────────────

The Mike Moment

On the same week I ran that test, an open-source legal AI platform called Mike landed on Hacker News and started circulating through legal-tech circles. It is licensed under AGPL-3.0, self-hostable on your own infrastructure, and brings its own headline claim: feature parity with Harvey and Legora (the two enterprise tools that have, until now, defined the high end of legal AI) at zero licensing cost.

Built by a former BigLaw attorney, Mike wraps frontier large language models into legal-specific workflows. Document-aware chat. Verbatim citation back to source pages. Tabular review across hundreds of contracts. Multi-step workflows for due diligence and contract drafting. You bring your own API key. The code is open. The prompts are inspectable. The data, if you self-host it properly, never leaves your perimeter.

Whatever you think of the product itself, the arrival matters.

For two years, the conversation about serious legal AI in the GCC has effectively been a conversation about budget. Harvey was a line item only the largest firms could justify. Legora was the same. The smaller end of the market made do with general-purpose chatbots and hoped for the best.

That gap has now closed. Or at least, it has narrowed enough that the question is no longer can a startup or a lean in-house team access serious legal AI. The new question is the one founders should have been asking all along:

Should you be using legal AI for the work you are about to use it for?

────────────────────────────────────────────────────────────────

What These Tools Are Actually Good At

Let me say what I genuinely believe, before the warnings. The current generation of legal AI: Mike, Harvey, Legora, and the frontier general-purpose models behind them, is not a gimmick. Used properly, it is one of the most significant productivity shifts the legal profession has seen in a generation.

Where these tools shine:

First-pass document review at scale

Surfacing termination clauses, change-of-control provisions, or assignment restrictions across a hundred contracts in minutes — work that used to take a junior associate a week. Done well, with citations, this is genuine value.

Drafting from a strong template

If you give the tool a clean precedent, a clear instruction, and a defined scope, it will produce a competent first draft. Not a final draft. A first draft.

Structured data extraction

Pulling renewal dates, payment terms, governing law clauses, indemnity caps from a stack of agreements into a clean table is exactly the kind of mechanical task a machine handles well.

Comprehension and summarisation

Asking a tool to explain what a 90-page shareholders' agreement actually says, in plain language, is one of its most legitimate uses. It is reading. It is good at reading.

If your team is not using legal AI for these tasks, you are leaving real efficiency on the table. I will say that openly.

────────────────────────────────────────────────────────────────

Where They Fail Catastrophically

Now the harder part. The same tools that handle the tasks above with quiet competence will, on a different class of question, generate confident nonsense and present it indistinguishably from truth. This is not a bug to be patched. It is structural.

Large language models do not retrieve answers. They generate plausible-sounding text. When the model has strong training data on a question, the plausible answer and the correct answer usually align. When it does not, the model produces a fluent fabrication and has no internal mechanism to flag the difference. It does not know what it does not know.

Three failure modes are particularly relevant for GCC founders.

The jurisdictional thin-data problem

These models are trained overwhelmingly on US and UK common law content: case law databases, law school materials, public legal commentary, decades of English-language legal writing. The DIFC and ADGM are common-law jurisdictions, which gives the models a foothold. But the foothold is shallow. Onshore UAE under the Commercial Companies Law, the new Saudi Companies Law, the nuances of the General Authority for Competition in KSA, the specifics of free zone regimes, the training data is thinner, the public commentary is thinner, and the model's confidence is identical.

The result is what I encountered in my test. The model produced a procedure that would have been roughly correct in a generic common-law jurisdiction and presented it as DIFC procedure. A founder relying on that answer would have filed the wrong forms in the wrong sequence and discovered the error only when something downstream broke.

Recommended by LinkedIn The Latest from Litify: Bridging the Legal AI Maturity… Litify 2 weeks ago Week In Review (November 12, 2025) Alessandra Colaci 6 months ago 🤖 Here's How AI is Disrupting The Legal Sector Hanna Larsson 1 year ago 2. The currency problem

The Saudi Companies Law was overhauled. The DIFC has issued regulatory updates. ADGM regularly amends its rules. UAE corporate law has shifted significantly in recent years on foreign ownership, on substance, on economic nexus. A model trained even six months out of date can confidently cite a regime that no longer applies, with no indication that anything has changed.

The fabricated authority problem

The most insidious failure mode. The model will generate citations that look like real provisions, real articles, real case names. Some are real and misapplied. Some are real and misquoted. Some are pure fiction with the right surface texture. In jurisdictions with thin online case-reporting, which describes most of the GCC, verification is hard, and a busy founder will not do it.

Mike, to its credit, addresses one slice of this with verbatim citation against documents you upload. That is genuinely better. But the tool can only cite what you give it. The moment the question reaches outside the uploaded document, is this enforceable under UAE law? the underlying model is back on open ground, and the underlying problem returns.

────────────────────────────────────────────────────────────────

Smart Lawyers, Smart Tools

Here is the framing I want to leave you with, because the debate as it is usually staged, lawyers versus AI, human versus machine, replace or refuse, is the wrong debate.

The real competition is between three groups.

Lawyers without AI. Slower. More expensive. Increasingly uncompetitive on volume work where the machine is already adequate. Founders using AI without lawyers. Fast and cheap until the day it isn't. Then the cost surfaces in a missed filing, an unenforceable clause, or a regulator's letter. Lawyers using AI well. Faster than the first group. Safer than the second. Knows what to verify, what to ignore, and what the machine cannot see.

The middle group, founders running their legal function on a chatbot, is the one I worry about. Not because they are reckless, but because the failure is invisible until it isn't. A SAFE that looks fine until conversion. A board resolution that looks valid until a regulator queries it. A shareholders' agreement clause that looks ironclad until enforcement is needed and the drafting unravels in the specific way GCC courts read it.

The lawyer's value has always been judgement: knowing which question to ask, which answer to trust, which precedent applies, which risk is real and which is theoretical. None of that is replaced by AI. All of it is amplified when the lawyer uses AI well.

────────────────────────────────────────────────────────────────

A Framework for GCC Founders and Companies

If you are running a company in the UAE or KSA and your team is already using AI for legal work (or about to) here is the calibration I would ask you to apply. Three zones.

Green Zone — Use AI freely

Summarising long documents your team did not draft. Translating a clause into plain language for an internal stakeholder. Drafting first-pass commercial correspondence. Extracting structured data from a stack of contracts. Generating questions to ask your lawyer. Comparing two versions of the same document. Brainstorming.

These are tasks where the model's failure mode is inefficiency, not legal exposure.

Amber Zone — Use AI with a lawyer reviewing the output

First drafts of contracts based on your own templates. Comments on a counterparty's redline. Internal policy drafting. NDA generation. Term-sheet markups. Memos summarising a regulatory regime for an internal audience.

These are tasks where the AI accelerates real work but the output cannot leave the building unreviewed. The lawyer's job shifts from drafting to auditing, faster, cheaper, but no less essential.

Red Zone — Do not use AI as a substitute for legal advice

Determining whether something is enforceable in the UAE or KSA. Assessing regulatory risk. Structuring a transaction. Negotiating equity instruments. Responding to a regulator. Resolving a dispute. Anything where the answer turns on jurisdictional nuance or where being wrong has consequences.

This is the territory where confident hallucination is most dangerous, where the cost of an undetected error is highest, and where there is no substitute for a human who knows the regime, the regulator, and the room.

────────────────────────────────────────────────────────────────

The Discipline of Verification

If there is one habit I would ask every founder, every CFO, every in-house generalist using legal AI to internalise this year, it is this:

The model is a junior. A fast junior. A tireless junior. A junior who will never tell you they are guessing.

Treat the output the way a senior partner treats a first-year associate's memo: read it, question it, verify the citations, and never sign your name to it without doing so.

The arrival of Mike, and the open-source wave it represents, is going to make legal AI cheaper, more accessible, and more deeply integrated into how startups operate. That is broadly good. It is also going to make verification more important, not less.

The cost of getting it wrong has not fallen. Only the cost of getting it confidently wrong has.

────────────────────────────────────────────────────────────────

The Founder's Legal AI Audit

Before you let any AI tool: Mike, Harvey, Legora, ChatGPT, Claude, Gemini, or whatever your team has quietly adopted; touch your legal work, run this checklist.

Inventory. Which AI tools are your team actually using for legal tasks today? Including the unofficial ones. The free-tier chatbot someone uses on their personal laptop counts. Confidentiality posture. Where does your data go when an employee pastes a draft into a chatbot? Is it training the next model? Is privilege at risk? Self-hosted tools change this calculus, but only if deployed properly. Zone classification. For each task the tool is being used on, is it green, amber, or red? Be honest. Most teams have at least one red-zone use they did not realise was red. Verification protocol. For amber-zone use, what is the review path before output leaves the team? "We trust the AI" is not a protocol. Jurisdictional carve-outs. Has your team been instructed that GCC-specific questions are red zone by default, regardless of how confident the answer sounds?

If a regulator, an investor, or a court asked tomorrow how a particular legal output was produced, could you reconstruct the chain? Including the AI involvement?

────────────────────────────────────────────────────────────────

Run the Audit. If your team is using AI for legal work (and most are, whether leadership knows it or not) we will run the six-point audit above on your actual workflows.

────────────────────────────────────────────────────────────────

Need legal guidance for your startup?