Vertical AI: why generic models, even agents, won't get enterprises to production
Each week brings another announcement: a larger model, more parameters, a new agent framework promising to “just work” inside the business. And each week, inside enterprises, the same quieter pattern reasserts itself — pilots that demonstrate well, projects that stall short of production, and a slowly hardening suspicion that AI, for all its noise, remains generic. Useful, perhaps, for contact-centre scripts and document summaries; rarely much more.
The suspicion is half right.
Even today’s frontier models, including the largest, do not know how a particular organisation operates. They cannot tell which of three SKU IDs is canonical, nor whether the three represent one product or several. They do not know that an insurance rate exception was grandfathered to a broker during the market softening of 2022 and cannot be applied to new contracts. They do not know why a credit committee rejected a deal that, on paper, should have passed. No additional parameter count fixes this. The problem is not reasoning capacity but business context.
Agents do not resolve the gap either. An agent is best understood as a control mechanism — a model invoking tools in some logical sequence. If the tools are generic, the underlying data inconsistent, the decision logic undocumented, or the available tooling misaligned with the actual scope of the business, what the agent automates is, in effect, a confused junior analyst: faster, cheaper, and still wrong.
The answer does not lie solely in strategy documents either, though a clear vision of where AI can take the organisation, and what such a journey would require, is part of the broader exercise. Nor does it lie in boxed solutions, which can supplement a process but rarely fit any specific business closely enough to embed into it.
What actually drives adoption — with adoption being making AI an inseparable part of the operating process while laying the groundwork for long-term economic impact — is the construction of organisation-specific vertical AI systems, enriched with domain knowledge and embedded into the decision-making cycle.
The constraint that shapes AI adoption in our geographies
In the markets where we work — and increasingly across Europe and the broader West, as “AI sovereignty” rises to the level of board and state concern — the convenience of “just call the frontier API” is not on offer. On-premise deployment becomes the default rather than the exception. Budget constraints, meanwhile, may be tighter than they were before, and especially so for AI: the technology has been broadly accessible for around five years, yet most businesses have undergone little tangible change as a result. The budgets typically available are sufficient to inference a model in the 30–100B range, sometimes a stack of smaller specialised ones, but nothing approaching the scale of general intelligence the largest providers now offer.
This matters because it changes the engineering problem. Weak context cannot be papered over with raw model intelligence. Under such constraints, the cost of getting the architecture wrong is not merely “slightly worse outputs” — it is a system that produces nothing useful at all, and which continues to feed a half-justified bias that AI is of little real value to business.
Where the field has been converging
The most serious global AI players all recognise the problem described above. Although they have chosen markedly different approaches to address it, their thinking converges on a common point.
Palantir — probably the most successful, and certainly the most discussed, enterprise AI company in the world — does not lead with a model. It leads with an ontology: a structured representation of the organisation, of its objects, relationships, actions, and decisions, into which an LLM is then plugged. And it delivers this through Forward-Deployed Engineers who sit inside the enterprise client, sometimes for years, until the system reflects how the business actually operates.
C3 takes a different bet — pre-built domain models for entire industries. There is real merit to the speed-to-value argument, though the critique is also fair: pre-built packages are a shortcut to a shape rather than to a particular business, and the data integration problem does not disappear because someone has shipped a template.
McKinsey QuantumBlack, having observed hundreds of deployments, now publishes openly that roughly 90% of vertical generative-AI use cases remain stuck in pilot mode, and that the only delivery model that consistently moves them out of that state is the durable cross-functional squad — domain experts, process designers, AI engineers, and data engineers — embedded inside the business.
Anthropic, the model maker itself, has recently committed $100M to a partner network whose central feature is embedded Applied AI engineers working alongside clients and partners — a tacit recognition that shipping the model is not the same as shipping the solution.
The red thread running through these strategies is unmistakable. The LLM is the smaller part of the stack; the work lies in learning the organisation, in understanding how it lives, how it decides, and in deriving value from automating and reinforcing those processes with AI. And the only delivery model that consistently reaches production is one in which engineers are embedded in the business until the system reflects it.
What an enterprise AI system actually looks like
If the LLM is the smaller part of the stack, it is worth being explicit about what the rest of the stack is. In our experience, an enterprise AI system has five layers — and most of the work, and most of the failure, lives in the lower three.
Raw layer
The raw layer is where everything lands. Structured data from core systems; unstructured data from documents; and — critically, and most often missed — raw organisational knowledge: emails, presentations, memos, Excel models, meeting notes, decision logs, regulatory filings. This corpus reflects not only the numbers but the reasoning behind them. Most of it has never been read systematically by anyone, and increasingly, AI methods combined with humans in the loop make it possible to read it at the scale required. Even the act of gathering this material — which is the raw layer’s only real job — already represents a substantial step beyond the prevailing baseline, in which most of it is simply absent from the system altogether.
Primary layer
The primary layer is where data becomes a source of truth. This is where the unrewarding but indispensable work is done: deduplicating entities, reconciling fields that mean different things in different systems, identifying anomalies, flagging conflicts that cannot be resolved automatically, and — most importantly — encoding how the data should be interpreted in the first place. A “customer” is not the same object in the CRM and in the credit system. A “transaction date” may mean booking date in one system and value date in another. A “client rejection” may refer to the client declining the offer, or to the client being declined by the underwriting engine. The primary layer is where these definitions are made explicit and consistent, and where the foundational interpretive questions are answered. There is an unglamorous phrase that captures why most AI projects fail: garbage in, garbage out. Bad data does not produce good decisions, no matter how capable the model. The primary layer is the answer to that problem and the foundation on which everything downstream depends.
Feature layer
The feature layer is where the system becomes fit for purpose. For each use case, a specific slice of the primary layer is selected and shaped — not because the model could not, in principle, reason over the entirety of the data, but because granting an LLM (and we should remember that we are typically working with a relatively small one) access to the entire data warehouse is a reliable way to confuse it. A feature layer should contain what the model needs for the task, in the form the model can use. It should also contain something that most architectures miss: the meaning of that data within the specific context, and the decision logic that applies. The reason a credit officer rejects a borderline file, the rationale (where it is legitimate) by which a procurement manager favours one supplier over another, the precise contractual clause on which a claims adjuster bases a pushback on a settlement — these criteria, precedents, and exceptions live here. This is the layer at which the system stops being a generic analytical engine and begins to reflect how a particular business actually thinks.
Orchestration layer
The orchestration layer is where the system acts. It is, in essence, a routing and execution engine: receiving a trigger (a user request, a system event, a detected anomaly), interpreting intent, selecting the appropriate feature slice, choosing the right combination of tools and models for the task, sequencing their execution, managing state across multi-step workflows, and enforcing governance and access controls throughout. Around it sit the tools the orchestrator can reach for — OCR for documents, classifiers for routing, forecasting and optimisation models, task-specific tools (a covenant checker, a pricing engine, a tariff classifier), generic analytical tools, and, of course, one or more language models. Orchestration is not a single mega-prompt; it is a routed, governed system that knows which tool fits which question, which feature layer to consult, and how to compose intermediate outputs into a coherent answer or action. It is also where the feedback loops attach, capturing both qualitative outcomes (was the recommendation accepted?) and quantitative ones (did the metric move?), and turning those signals into systematic improvement.
Use case layer
The use case layer is where the system meets the user — the recommendation, the answer, the action, the alert, the workflow step. This is the only layer most stakeholders see, which is why it tends to dominate AI conversations. By the time a use case feels good in the hands of an end user, however, the real work has been done in the four layers below.
Domain knowledge as a key bottleneck
The primary and feature layers do not populate themselves. Both depend on something that does not yet exist when an engagement begins: a structured, machine-readable representation of how the business actually thinks. This is the part of the problem that gets glossed over in most discussions of enterprise AI.
In most organisations, the real knowledge — stakeholder preferences, B2B client behaviour, exception precedents, and so on — is not set out clearly in any single document. It resides instead in experience and habit, in unstructured Excel files and notes, in verbal agreements and Telegram exchanges. Extracting it is neither a scraping exercise nor a workshop; it is a discipline, and in our experience it requires a combination of methods, applied as the context demands.
- First, sitting with the people who make the decisions — credit committees, underwriters, operations leads, treasury desks — for a working relationship that lasts as long as the build itself, not for the duration of an interview.
- Second, pulling the artifacts: emails, presentations, Excel models, memos, meeting notes, decision logs, regulatory filings. This is what fills the raw layer beyond what ERP and CRM connectors can supply. Most of this material has never been read systematically by anyone, and increasingly, AI methods combined with humans in the loop make it possible to derive real use from it.
- Third, codifying the why, not merely the who. A knowledge base that records “the credit committee decides large loans” is useless to a system. The feature layer requires the criteria, the precedents, the exceptions, and the tacit weighting between them.
- Fourth, treating the knowledge as alive. Processes change. Regulation moves. Exceptions accumulate. A static knowledge base is wrong within months. The architecture must therefore include an ingestion path for new knowledge and a deprecation path for stale knowledge, both governed.
What is gathered, reconciled, and codified through this work then has to be packaged into something the AI system can act on. In practice, this means translating the captured knowledge into the structured artefacts the lower layers consume: schemas and entity definitions in the primary layer; feature specifications, decision rules, and contextual annotations in the feature layer; prompt templates, tool descriptions, and routing logic in the orchestration layer; and evaluation harnesses that test, end to end, whether the system reasons over this knowledge as the business intends. Done properly, the result is not a separate “knowledge base” sitting alongside the system but knowledge that has been compiled directly into the system’s architecture and behaviour.
Through this — at times tedious — work of elicitation, systematisation, and integration, the operational logic of the organisation becomes something a constrained, on-premise AI system can genuinely reason over.
How we shape our approach to address the adoption issue
We do not claim to have solved the domain-knowledge problem, nor the broader adoption problem; nobody has, so far. But the way we have chosen to work reflects the conclusions set out above.
The most probable path to enterprise success for AI, as we see it, is through embedding engineering capability — backed by the right technical platform, one that accommodates the specifics of each organisation — directly into the enterprise. The domain-knowledge work — sitting with decision-makers, pulling artifacts, codifying the why — has, to date, no shortcut: it is solved only through deliberate, painstaking effort.
That said, an engineering team cannot arrive empty-handed; the general principles of the architecture remain transferable across engagements. We have therefore built the Agentic platform — our orchestration layer — so that engagements do not begin with nine months of plumbing. The platform carries the architecture for the layers below it, the orchestration logic above, and the framework we use to describe the universe in which a particular AI system operates. What we build for each client is what the platform carries.
This approach, in our view, also fits best with companies — Agentic Lab among them — pursuing a vertical or multi-vertical strategy. It accelerates engagements by bringing accumulated domain knowledge (not as a substitute for the client’s, but as a complement to it), and it makes it possible to ship standalone products in cases where a particular vertical solution is itself a meaningful innovation.
We are deliberately multi-vertical. We invest in verticals — some industrial, some functional — where three conditions hold simultaneously: we have, or can build, the relevant domain knowledge; we have the right tools and models for the work; and there is clear market demand. Each vertical is a portfolio bet. The platform stays the same; what changes per vertical is the feature layers, the tool set, the model mix, and the decision logic captured. Agatha is one such bet — and even Agatha, packaged as a SaaS product, is built on the same Agentic platform that underpins the enterprise solutions we build for our clients.
A closing thought
Vertical AI is not a marketing label, and it is not a finished idea either. It is a description of the configuration on which the most serious players in enterprise AI have, separately, converged — because under real constraints (sovereignty, on-premise deployment, regulated industries, finite budgets), it appears to be the only configuration that produces enterprise-grade outcomes.
Most of the work lives in the parts that do not demo well: the data reconciliation, the knowledge elicitation, the orchestration, the feedback loops. Anyone can call an API. Far fewer can sit with a credit committee, a claims team, or a treasury desk and turn how they think into something a system can do. Fewer still can do so on-premise, on smaller models, in regulated environments.
That, more than the choice of model or framework, is where the next phase of enterprise AI will be won or lost. We do not believe AI adoption is solved without addressing it — and we believe it is a conversation worth having more openly than the field currently does.
© 2025, Registered software: Analytical AI Assistant based on the Agentic Lab Agent Orchestrator.