Until recently, applications based on large language models(LLMs) were treated as a curiosity – a demonstration of AI capabilities or a new conversational interface. Today, they are increasingly becoming the basis of entire business systems. This shift has concrete consequences: it changes not only the way we design software, but also who designs it and what competences are needed to do it well.
IT architects are therefore faced with the question: is the traditional model of building applications still sufficient?
From code to prompts – a paradigm shift
In the classic application architecture, the logic of the system was based on predictable code. Input data was processed according to clearly defined rules and the result was mostly deterministic. In an approach with LLM on board, this logic is partly transferred to the model – which does not operate according to code, but according to prompt.
Prompt is not just a command. It is a new form of logical interface, containing instructions, constraints, contextual data and an expected form of response. As a result, some of what used to be written as code is now ‘designed’ as prompt. Instead of precise conditional instructions, designers must now think in terms of intent, examples and expected behaviour.
An app with AI inside – what does the new architecture look like?
In practice, this means a new model for building applications in which the LLM acts as one of the main layers of logic. A typical AI-centric application architecture might include:
- The orchestrator layer, which manages the data flow between the user and the model.
- Prompt manager, responsible for generating, versioning and managing prompts.
- Handler model that handles LLM calls (locally or via API).
- Evaluator, which analyses the quality of the response and decides whether an intervention or retry is required.
Added to this are components such as response caching, token cost analysis, query classification or semantic error detection. As a result, the application architecture begins to resemble a system based on a machine learning pipeline, rather than a classic monolith or microservice.
New challenges for IT teams
The biggest change, however, is that LLMs introduce an element of … into the application. unpredictability. The models are probabilistic – they can give different answers for the same input, depending on parameters like temperature, context or prompt structure.
This in turn means:
- A new way of testing: classic unit tests are no longer sufficient. Benchmark tests, response quality assessments and semantic benchmarks are needed.
- A new approach to errors: you need to design applications to deal with an inaccurate, incomplete or irrelevant response.
- The need for ‘quality evaluation’: there is an emerging role for the evaluation engineer to assess whether the model is working in line with business expectations.
From LLM as an add-on to LLM as the core of the system
One of the most important architectural decisions becomes the answer to the question: where to place the model? There are three basic scenarios to choose from:
- Cloud-based model (API) – simpler implementation, but less control, higher costs and data limitations.
- Local (on-premise) model – greater flexibility and control, but need computing power and a team to operate.
- Model as an internal microservice – a compromise approach using internal APIs and local infrastructure.
The choice depends on many factors: from regulatory requirements to the cost of tokens to security and confidentiality needs.
New design patterns: RAG and agent-based design
From an architectural point of view, two approaches become particularly relevant:
- RAG (Retrieval-Augmented Generation) – combines LLM with an external knowledge base (e.g. a document database or vector search engine) to provide more timely and precise answers. Used in customer service chatbots, among others.
- Agentic AI – the model not only responds, but performs tasks: it can plan a sequence of actions, make an API call, update data. This approach is akin to building digital agents capable of autonomous activity.
Is it worthwhile? When not to use an LLM
In doing so, it is worth asking the question: should every application be LLM-based? The answer is: no. There are many cases where the classical approach is:
- Cheaper,
- more reliable,
- simpler to maintain.
If the system requires deterministic results, real-time operation or tight control of business logic – it is better to stay with a traditional architecture. LLMs work best for tasks that require natural language understanding, flexibility and adaptation – not necessarily in accounting or payment processing.
AI team: new roles and new processes
With the entry of LLMs into the architecture, the structure of IT teams is also changing. New roles are emerging:
- Prompt Engineer – designs and tests effective prompts.
- AI Product Owner – manages AI-based functionality and its impact on the user.
- Eval Engineer – assesses the quality of the model response on a continuous basis.
This also means a different approach to iteration – versioning prompts, testing their effectiveness and managing them as product artefacts.
AI as a new layer of application logic
Implementing large language models into applications is more than a change in technology – it is a change in the way we think about system logic, interface and architecture. The LLM stops being a tool and becomes a new layer of the system – equal to a database or a business rules engine.
IT architects today must not only know the limitations and capabilities of LLM, but also understand how to build a resilient, secure and cost-effective system around them. The new era of software is not just AI-first – it is also rethink-architecture-first.