Micro Language Models and Boolean Agents: Two Ideas That Could Reshape Enterprise AI

Every enterprise AI conversation in the last two years has been about scale. Bigger models, longer context windows, more parameters, more GPUs. And yet, if you talk to anyone who has actually shipped AI inside an SAP landscape, a banking core, or a hospital information system, you hear the same complaints: it's too slow, too expensive, too unpredictable, and the auditors hate it.

I want to put two ideas on the table that I think solve most of those complaints. Neither is entirely new, but together they form a pattern I haven't seen anyone name yet — and it's the pattern I believe enterprise AI will converge on over the next three years.

The first is Micro Language Models. The second is Boolean Agents. They're independently useful. Combined, they're something else entirely.

Part 1: Micro Language Models

We already have a name for small models — SLMs. They typically range from one to ten billion parameters and are positioned as the cost-efficient cousin of the frontier LLMs. That's a useful category, but it's still too big and too generic for a lot of what enterprises actually need.

A Micro Language Model — μLM — is something stricter. I'd define it by a contract, not just a parameter count:

It does exactly one thing.
It fits in under 500 MB.
It runs on CPU at sub-50ms p99 latency.
It is versioned, evaluated, and monitored like a microservice, not like a research artifact.

In the 10M to 500M parameter range, models like DistilBERT, TinyBERT, and fine-tuned encoder-only architectures already live. What's missing is the discipline of treating them as infrastructure. A μLM is not a chatbot. It's a function call that happens to have learned its behaviour from data.

The case for this category isn't theoretical. A recent analysis of 287 production deployments found that a fine-tuned 350-million-parameter model beat ChatGPT by three times on structured tool calling, and a fine-tuned 3.8B model outperformed GPT-4o on its specific task. Gartner is now projecting that by 2027, enterprises will deploy task-specific small models three times more often than they deploy large language models. The market is moving in this direction whether or not we give it a name.

Why μLMs matter inside an enterprise

The honest answer is that most enterprise AI workloads do not need reasoning. They need recognition. They need to look at a vendor name and decide which master record it belongs to. They need to read an invoice line and predict the right GL code. They need to glance at an incoming IDoc and route it to the correct handler. They need to flag a purchase order whose line item doesn't match its category.

These are not GPT-4 problems. They're 200-millisecond classification problems that today either get over-served by an LLM API (expensive, slow, and a data-residency nightmare) or under-served by 15-year-old regex rules nobody dares touch.

A μLM sits in the middle. It runs inside your firewall, on a CPU, inside the same transaction boundary as the ABAP commit it's helping. It costs almost nothing per inference. It doesn't leak data. And because it does exactly one thing, you can actually evaluate it — true precision, true recall, on real production samples.

Part 2: Boolean Agents

Now the second idea, and the one I find more interesting.

Today's "AI agents" are language-model loops. You give them a goal, they reason in natural language, they call tools, they observe results, they reason some more. This works beautifully for open-ended tasks like research and coding. It works terribly for enterprise decisions, because every enterprise decision has the same three requirements: deterministic, explainable, auditable. And a chain of LLM thoughts is none of those things.

A Boolean Agent is an agent that emits exactly one thing: a strict true or false, plus a justification and the evidence it relied on. Nothing more. No paragraphs, no plans, no creative interpretation. Its output schema is fixed:

{ value: bool, confidence: float, justification: string, evidence: ref[] }

That constraint sounds limiting until you realise what it unlocks. Because every Boolean Agent has the same output type, you can compose them. You can wire them together exactly like logic gates. AND. OR. NOT. NAND. XOR. A network of Boolean Agents is a circuit — and a circuit is the most auditable artifact in computer science. Every node has a defined truth table. Every edge is a typed signal. Every decision is reproducible.

The closest existing work is academic. There's a multi-agent system called CircuitMind that distributes gate-level reasoning across specialised agents to generate digital circuits, and there's a body of formal work on multi-agent depth-bounded boolean logic. But nobody, as far as I can find, has packaged "boolean composition" as a product pattern for enterprise agents. That's the gap I'm pointing at.

Why this matters

Every CFO and Chief Risk Officer I've spoken to about generative AI eventually says the same thing: "I love what it can do, but I can't put it in front of an auditor." That's not a model problem. It's an architecture problem. LLM agents are non-deterministic by design, and their reasoning trace is a story, not a proof.

A Boolean Agent network flips this. The decision is the circuit. The audit trail is the wiring diagram. When something goes wrong, you can point at the exact gate that flipped, look at its truth table, look at the evidence it received, and understand precisely why. That's the property enterprise compliance has been waiting for.

Part 3: The Combination

This is where it gets interesting. μLMs are the substrate. Boolean Agents are the wiring abstraction.

Each gate in a Boolean Agent network is implemented by a μLM trained for one predicate. The network itself is a typed directed graph that any business analyst can read. Together, they give enterprise AI three things current agentic frameworks struggle to deliver simultaneously:

Determinism — the same inputs always produce the same circuit output.
Auditability — the circuit is the audit trail.
Latency and cost that fit inside a transaction — a ten-gate network of 200M-parameter μLMs runs in under 200ms on a single CPU. That means it can sit inside an ABAP commit boundary, inside a payment authorisation flow, inside a real-time fraud check. You cannot say that about a GPT-4-class agent loop.

Real-World Problems This Pattern Solves

Let me ground all of this in problems I've actually seen inside enterprises. None of these are hypothetical.

1. Three-way match in Procure-to-Pay

The classic pain point. A purchase order, a goods receipt, and an invoice need to agree before payment. Today this is a brittle rule engine that breaks on every edge case — currency drift, partial deliveries, line-item splits, tax mismatches. AP teams spend their weeks clearing exceptions by hand.

A Boolean Agent network handles this naturally. The top-level gate is a simple AND: PO_valid ∧ GR_valid ∧ Invoice_matches. Each input is itself the output of a sub-network. Invoice_matches decomposes into gates for line-item parity, tolerance check, currency reconciliation, tax-code validity, and vendor sanctions screening. Each leaf gate is a μLM doing exactly one classification. When an invoice fails, the system doesn't say "rejected" — it shows the auditor the exact gate that flipped and the evidence behind it.

2. Segregation of Duties in GRC

Segregation of Duties is the rule that the person who creates a vendor cannot also approve payments to that vendor. Easy in principle, nightmarish in practice, because real organisations have thousands of overlapping roles, delegated authorities, temporary cover arrangements, and matrix reporting lines.

Today, GRC tools handle this with static rule tables that go out of date the moment HR makes a reorg. A Boolean Agent network can read the actual organisational graph in real time and evaluate predicates like "does this approver report, directly or transitively, to the requester?" or "is this approver currently covering for someone who would have a conflict?" Each predicate is a gate. The final SoD check is the AND of all of them. Auditable. Composable. Updates the moment HR data changes.

3. Customer onboarding and KYC

A new customer applies. The bank needs to decide, in seconds, whether to onboard them. Current systems run a waterfall of opaque scoring models that regulators hate. A Boolean Agent network gives you a circuit: identity verified ∧ address verified ∧ not on sanctions list ∧ not a politically exposed person ∧ source of funds plausible. Each gate is a μLM trained on its specific predicate. The output is binary, the reasoning is visible, and the regulator gets a diagram instead of a black box.

4. Invoice routing and GL coding

A μLM trained on six months of historical invoices learns to predict the right GL code with better than 95% accuracy in most enterprises. It runs on a CPU, costs essentially nothing per inference, and replaces a process that today consumes hundreds of hours of accounts-payable time per month. This is the cleanest μLM use case I know, and it's deployable inside a single sprint.

5. Master data deduplication

Every SAP migration project I've seen has the same horror story: 80,000 vendor records, 30% of which are duplicates with subtle spelling variations, address formatting differences, and trailing whitespace. Rule-based dedup catches the easy ones. A μLM trained on canonical-vs-variant pairs catches the hard ones. Wrap it in a Boolean Agent and you get a yes/no decision per pair plus the evidence for the data steward to review.

6. Field-service dispatch decisions

A service ticket comes in. Should it be auto-dispatched, escalated, or held for parts? Today this is a mess of overlapping rules. A Boolean Agent network with gates for severity, SLA risk, parts availability, technician proximity, and customer tier turns it into an explainable decision in milliseconds.

7. Insurance claims triage

Should this claim be auto-approved, flagged for review, or denied? The Boolean Agent network composes gates for policy coverage, fraud signals, claim history, and documentation completeness. Each gate is independently testable. The auditor gets a circuit. The customer gets a fast answer.

Where SAP Fits

I spend most of my professional life inside the SAP ecosystem, so the integration story matters to me. The good news is that this pattern fits SAP's current direction almost too neatly.

A Boolean Agent network can be exposed as a Joule Skill backed by a CAP OData action — exactly the architecture SAP is already shipping for custom AI extensions. The μLMs themselves can be hosted on AI Core as custom serving runtimes. The evidence store sits in HANA Cloud's vector engine. The wiring diagram becomes a Signavio decision model. From the user's point of view, they're talking to Joule. From the system's point of view, every decision is a deterministic circuit running on a CPU inside the customer's tenant.

This is the part I find compelling. SAP's clean-core philosophy actively encourages exactly this kind of side-by-side extension. You're not fighting the platform. You're filling the gap the platform left open: deterministic, auditable AI for the decisions that today live in BRF+ rules and ABAP includes nobody wants to touch.

The Honest Caveats

I'd be lying if I said this pattern is fully baked. A few things to be clear-eyed about:

The name Micro Language Model will fight for oxygen. SLM is already the industry's settled term. The differentiator has to be the contract — the latency, the size, the single-purpose discipline — not the prefix.

Real enterprise decisions are rarely purely boolean. Sometimes they're weighted, fuzzy, or probabilistic. You either extend the gate algebra to ternary or fuzzy logic (loses some elegance, gains realism) or you push the fuzziness into the upstream μLMs and keep the gate layer strict. I lean strict, but it's a real design choice.

And the moat isn't the models. Anyone can fine-tune a 350M parameter model. The defensible thing is the registry, the evaluation harness, the drift monitoring, and the visual composition tool that lets a business analyst wire gates together without writing code. Build that, and the models become commodities you swap underneath.

Where I Think This Goes

The frontier-model arms race is going to continue, and it should. There are problems that genuinely need a 400-billion-parameter model. But most of what enterprises actually want from AI isn't on that list. Most of it is recognition and decision-making — the same problems we've been trying to solve with rules engines for thirty years, except now we have the tools to do it properly.

Micro Language Models give us the substrate. Boolean Agents give us the architecture. Together, they offer something the current generation of agentic AI does not: decisions that are fast enough to live inside a transaction, cheap enough to run at enterprise scale, and explainable enough that an auditor can sign off on them.

That's the version of enterprise AI I want to build. If you're working on something in this space, or you think I'm wrong about any of it, I'd love to hear from you.