Is your company’s data ready for AI? Five questions to ask before starting a project

Companies are increasingly aware of where they would like to use artificial intelligence: in customer service, sales, finance, manufacturing, document analysis or demand forecasting. They also increasingly have the budget, tools and initial pilot projects in place. The problem arises at a lower level: with the data.

Data that has been sufficient for years for reports, dashboards and monthly analyses is not always sufficient for a system designed to recommend decisions, automate processes or respond to customers in real time. A report can tolerate manual corrections, delays and incomplete records. An AI model, however, works with whatever it is given. If it sees an incomplete, inconsistent or out-of-date picture of the company, its output may appear convincing but lead to poorer decisions.

That is why, before embarking on an AI project, it is worth asking not so much ‘do we have the data?’, but rather ‘is this data suitable for the specific application we wish to implement?’.

According to a study by Harvard Business Review Analytic Services and Cloudera, only 7 per cent of companies state that their data is fully ready for AI. In another Cloudera study, 79 per cent of respondents indicated that AI initiatives are hampered by limited access to data across environments, whilst only 30 per cent of companies had fully integrated data sources. This clearly shows that, in many organisations, the barrier is no longer the desire to use AI itself, but the foundation on which AI is to operate.

Lots of data is not enough

In many companies, the data exists but is scattered. Some is in CRM systems, some in ERP, some in e-commerce, some in customer service systems, spreadsheets, documents and correspondence. For someone familiar with the organisation, such a landscape can be cumbersome but still interpretable. For AI, however, it can become a source of errors.

It is worth distinguishing between three levels. The first is existing data – that is, data which the company already possesses somewhere. The second is usable data, which can be combined, understood and utilised without lengthy manual organisation. The third is AI-ready data, i.e. data that is sufficiently up-to-date, consistent, described and relevant to a specific business objective.

This distinction is important because an AI project does not require perfect data across the entire organisation. It requires data that is good enough for the specific decision it is intended to support.

Where projects most often fall down

The most common gaps are usually less spectacular than the model itself, but more decisive for the outcome. Data silos mean that AI sees only a fragment of the customer, product or process. A lack of common definitions means that sales, finance and operations may have different understandings of terms such as ‘active customer’, ‘margin’ or ‘lost order’. Out-of-date data means the model describes the past well but reacts poorly to market changes. The lack of a data owner makes it difficult to quickly pinpoint where an error originated.

In a Confluent survey, 72 per cent of IT leaders cited challenges with real-time data processing as a barrier to scaling AI. 66 per cent mentioned uncertainty regarding the origin of data, whilst 65 per cent cited fragmented responsibility for data. At the same time, only 32 per cent of organisations had AI agents actually operating in production.

These figures show that AI readiness is not solely a matter for the data team. It concerns the way in which a company defines its processes, assigns responsibility and determines which sources of information are reliable.

Five questions to ask before an AI project

A good starting point is a brief data readiness test. This is not about auditing the entire organisation, but about getting one specific use case in order.

The first question is: what business decision is the AI intended to support? The data may be sufficient for customer segmentation, but insufficient for automated risk assessment or dynamic pricing.

Second: does the data show the entire process, or just a part of it? A model based solely on CRM may not account for complaints, payment history, returns or interactions with customer service.

Thirdly: do different systems provide consistent information about the customer, product or transaction? If a customer has different IDs in the CRM, e-commerce and service systems, AI cannot build a complete relationship history. Instead, it builds several disjointed histories and attempts to draw a single conclusion from them.

Fourth: is the data up to date enough for the decision the model is intended to support? Data suitable for a quarterly report may be too slow for sales recommendations, anomaly detection or stock management.

Fifth: is it clear who is responsible for the key data? When a model’s output raises doubts, the company needs to know where to look for the source of the problem and who can rectify it.

The most costly gaps are the ones that aren’t obvious

In practice, AI projects rarely stall because a company ‘doesn’t have the data’. More often than not, it turns out that data is missing in key areas. There are no reliable labels – that is, no indication of which result was correct. There is no information on the data’s provenance. There is no audit trail. There is no single point of reference when systems display different versions of the same information.

It is precisely these gaps that hinder the transition from a high-profile pilot to a deployment that works day in, day out. In a pilot, many issues can be worked around manually. In production, every exception starts to take its toll: on the team’s time, user trust, the quality of decisions or regulatory compliance.

This problem is also clearly evident in broader data on the return on investment in AI. According to the PwC study in question, over half of CEOs have not yet seen significant cost or revenue benefits from AI, despite companies investing heavily in artificial intelligence, data analytics and the cloud. One of the challenges remains the transition from pilot projects to solutions integrated into the company’s operations.

This is not an argument for postponing AI. Rather, it is an argument for a more targeted approach. Instead of starting with a massive programme to organise all data, it is better to select one key use case and examine the data that is critical specifically for that use case. For sales recommendations, these will be different data sources than for fault prediction, complaint handling or financial risk analysis.

This approach allows you to see more quickly where the data is sufficient, where the project needs to be narrowed down, and where the greatest value will come from first clarifying definitions, access or responsibilities.

This is a matter for the board, IT and the business to address jointly

Data readiness for AI does not depend solely on technology. IT can provide integration, architecture and tools. The business side refines the decision that AI is intended to support. Compliance helps define the boundaries of data use. The board decides which applications are worth investing in and what level of risk is acceptable.

In this sense, the question of data is one of the simplest tests of an organisation’s AI maturity. It does not merely check the quality of tables in systems. It checks whether the organisation knows what it wants to automate, what the model will be based on, how it will be evaluated, and who is responsible for the information once AI begins to influence real-world decisions.

Models can be bought, tested and replaced. It is far more difficult to quickly make up for a lack of shared definitions, data owners, a history of changes and trust in the information. That is why companies wishing to move beyond pilot projects are increasingly discovering that AI does not start with a model. It starts with the question of whether the data truly reflects the business as it is.