The growing popularity of generative artificial intelligence (GenAI) is changing the way companies design and develop their applications. According to Gartner’s latest predictions, by 2030, up to 80% of newly developed business applications will use multi-modal GenAI – compared to just 10% in 2024. This is a dramatic increase that heralds a profound transformation of enterprise software.
What does ‘multimodal’ GenAI mean?
Multimodality is the ability of artificial intelligence models to process and generate different types of data – text, image, audio, video or numerical information – within a single coherent architecture. An example? A system that understands a voice query, analyses numerical data from a table and, based on this, generates a graph and a summary in the form of text or video.
These kinds of solutions already exist today – there are models on the market that enable, for example, the conversion of text to image (e.g. DALL-E), speech to text (e.g. Whisper) or text descriptions to video. However, integrating them into a single, holistic AI system remains a challenge – technologically, cost-wise and organisationally. Meanwhile, Gartner predicts that this will become the new standard by the end of the decade.
AI getting deeper and deeper into business software
According to Gartner analysts, as early as one to three years from now, companies will start to intensively integrate multimodal AI into everyday software – not just as an additional module, but as the core of a new generation of applications. This means the end of the experimentation phase and the beginning of mature implementations, also in areas such as CRM, ERP, HR systems or knowledge management tools.
This change will affect several key aspects of software development:
- Interface design: instead of forms and clicks – voice, video or image interactions. The user will ask the system by voice for data and the AI will respond with text, video or infographics.
- Data management: non-numerical data, hitherto often not used, will become more important. AI will learn to analyse and combine different sources of contextual data, which will increase the quality of decisions.
- Task automation: systems will be able to recognise user intent and take action without the need for detailed instruction. This means a new level of autonomy in software.
New competences for companies and IT managers
The transformation towards multimodality will require new investment decisions. Gartner emphasises that it will be product managers and CTOs who will have to define which software components can be extended with AI features and which need to be built from scratch.
Investments will not be limited to just licensing AI models. Equally important will be the preparation of data in appropriate formats, the training of models on industry-specific data (so-called fine-tuning) and the integration with existing IT environments. Understanding the extent to which multi-modal GenAI can realistically increase productivity and customer service quality will become crucial.
Changing digital culture in companies
Although AI as a technology has been around for years, it is only multimodality that opens up the potential for it to become the ‘operating language’ of the modern company. It is no longer just that AI analyses data faster – it is that it can better understand context, combine different sources of information and present them in a useful way.
In practice, this can mean that project management support systems will not only suggest where the risks are, but also create a meeting summary themselves, generate video recommendations or update the schedule based on the team’s conversations.
For many companies, such a change can be as difficult as moving to the cloud a decade ago – but at the same time just as inevitable.
The multimodal future has already begun
Already, major cloud and AI technology providers – such as Google, Microsoft, Meta and OpenAI – are developing next-generation multimodal models. Meta unveiled the I-JEPA model in June, Google is testing the ability to work on text, image and code simultaneously in Gemini, and OpenAI is developing GPT-4o with native multimodality. All indications are that the competition is no longer about ‘if’, but ‘how fast and to what extent’ multimodality will become standard.
By 2030, multi-modal AI will become integral to not only enterprise systems, but also line worker applications, training systems or back-office automation. It is no longer about testing capabilities, but about redesigning software with a new, more natural way of interacting with the machine.
For technology providers, this signals that the development of AI does not end with chatbots. For companies – that it is high time to assess the maturity of their own data and applications in terms of multimodality readiness.
If the trend identified by Gartner continues, multimodal GenAI will become not so much an add-on as the foundation of modern software. And this means that the future of corporate applications may look very different from today – more ‘human’, contextual and surprisingly flexible.