AI is Easy, Data is Hard

AI is Easy, Data is Hard

Artificial Intelligence (AI) is more accessible today than ever. From open-source machine learning libraries to user-friendly AI cloud services, anyone can deploy AI models with relative ease. But here’s the catch: AI is only as good as the data you give it. A state-of-the-art algorithm fed poor data will produce poor results – a classic case of “garbage in, garbage out.” Many companies eagerly invest in AI technology, yet they overlook the less glamorous work of data preparation and quality control, often to their peril.

In fact, AI initiatives are notoriously prone to failure when data problems are ignored. Gartner reports that 85% of AI projects fail, with issues like poor data quality and lack of relevant data among the top reasons (Gartner, 2020). Even when the AI tech is sound, bad data can derail the outcome – one survey found that 99% of AI and machine learning projects encounter data quality issues (Forbes, 2020). As one industry observer succinctly put it, “it’s a data game, not a code fest.” The core of any AI system “lies not in complex coding, but in the data that powers it” (Ng, 2020). In other words, the real power lies in information, not just algorithms.

AI can feel like a high-wire act: the technology is dazzling, but without a solid data foundation, it’s one step away from a plunge. Businesses often race to implement AI solutions, pouring resources into model development, while treating data as an afterthought. To make AI work, organisations must fortify that rope – focus on data collection, cleaning, and governance – or risk a very public tumble.

Extreme Worst-Case AI Failure

What happens when AI is fed bad data? In the extreme case, it can lead to spectacular failures. A famous real-world example comes from Amazon. The company built an experimental AI recruiting tool to automatically screen resumes and identify top talent. Unfortunately, the AI quickly turned biased against women and Amazon had to shut it down (Dastin, 2018). The model was trained on ten years of past hiring data and, because the tech industry (and thus Amazon’s prior hires) were predominantly male, the AI learned a toxic lesson. In effect, Amazon’s system taught itself that male candidates were preferable, even penalising resumes that included the word “women’s” (as in “women’s chess club captain”). It even started downgrading graduates of women’s colleges. The very data that was meant to help the AI select the best candidates instead taught it to discriminate. This extreme failure was a direct result of biased, poor-quality training data.

Now, imagine a similar data issue in a higher-stakes arena. An AI system in healthcare could be fed incomplete or unrepresentative patient data and end up misdiagnosing illnesses or recommending unsafe treatments. In finance, a trading algorithm might make disastrous decisions if its market data is skewed or erroneous, potentially triggering huge losses. These hypotheticals underscore a sobering truth: when AI fails due to bad data, it can fail hard – with outcomes ranging from embarrassment and lost business to legal troubles or harm to human lives. It’s the nightmare scenario that shows why getting the data right is absolutely critical.

Expected Worst-Case: Everyday AI Failures

Not every AI mishap makes headlines. Far more common is the everyday worst-case scenario: AI that doesn’t live up to its promise because of flawed or insufficient data. These failures may be mundane, but they are widespread – and they carry a real cost for businesses. Bad data can lead AI systems to make decisions that are simply off the mark, resulting in inefficiencies and missed opportunities rather than dramatic crashes. Consider a few typical scenarios:

  • Wrong Product Recommendations: You’re browsing online and the site keeps recommending items that make you scratch your head. Often, that’s an AI recommendation engine trained on outdated or erroneous data. If the data about your past purchases or preferences is wrong, the AI’s suggestions will be irrelevant. Such poor recommendations frustrate customers and translate to lost sales. (It’s a direct consequence of the “garbage in, garbage out” principle at work.)
  • Misreading Customer Behaviour: Companies use AI to predict customer churn, lifetime value, or buying interests. But if the customer data feeding those models is incomplete or biased, the predictions can be way off. Imagine a loyal customer incorrectly flagged as likely to leave because the AI lacked data on a recent positive service interaction. The business might waste time and money trying to “win back” this customer (who wasn’t actually at risk), while a truly unhappy customer might get ignored because the data painted an overly rosy picture. These subtle data errors – a missing field here, an out-of-date entry there – lead to AI-driven decisions that misjudge real customer behaviour. The result is poor business outcomes, like mistargeted marketing or failing to retain a high-value client.
  • Inventory Planning Blunders: In retail and supply chain, forecasting demand is everything – and AI is often tasked with this job. But bad or blind spots in data can mean the difference between overstock and stockouts. If an AI forecasting model is fed sales data that doesn’t account for a sudden trend or an external shock (say, an upcoming holiday or a viral social media frenzy), it might grossly underestimate demand. The outcome: empty shelves and missed revenue. On the flip side, overestimate demand based on “loud” but misleading data, and you end up with a warehouse full of product that nobody buys. For example, at the start of the COVID-19 pandemic, many stores found their shelves suddenly empty as panic-buying ensued – a scenario most AI models failed to predict because nothing like it was in the historical data (Forbes, 2020). Shoppers were looking for toilet paper and other essentials in unprecedented quantities, and the algorithms were caught completely off-guard. These everyday failures, while not as sensational as a rogue AI, quietly erode profits and efficiency.

Individually, these issues are small fires; collectively, they’re a raging blaze of lost potential. Poor data quality is estimated to cost U.S. businesses around $3.1 trillion every year (Pereira, 2020). Think about that – trillions lost not to sci-fi robot rebellions or grand AI glitches, but to mundane problems like wrong pricing data, duplicate records, missing fields, and miscategorised inventory. In the business world, the “silent killer” of AI ROI is bad data. It’s a pervasive problem that shows how the real power play is in information. Get the data right, and even simple AI can yield great results. Get the data wrong, and even the most advanced AI will struggle.

FMCG Case Study: When Bad Data Spoils the Recipe

To see how data issues play out in practice, let’s look at a scenario in the fast-moving consumer goods (FMCG) industry – makers of everyday products like food, beverages, and toiletries. Imagine a consumer goods company about to launch a new snack beverage. They decide to use AI to forecast demand, set the optimal price, and plan the marketing campaign. The team pours in all the data they have: last year’s sales for similar products, market research survey results, and even some synthetic data (artificially generated examples) to cover scenarios they haven’t seen before. The AI model crunches the numbers and comes back with confident predictions – it forecasts sky-high demand for the new drink and even suggests that consumers would be willing to pay a premium price.

Launch day comes, and things don’t go according to plan. In reality, sales are lukewarm. It turns out the historical sales data was misleading – last year, a competitor had a supply chain issue that temporarily drove more customers to our hypothetical company’s products, artificially boosting those numbers. The model didn’t understand that context. It also turns out consumers were more price-sensitive than the AI assumed. By relying on synthetic and historical data without the full picture, the company overestimated demand and overpriced the product. They’ve now got warehouses full of unsold cans of the new drink. To move inventory, they’re forced to slash prices in a fire sale, hurting their profit margins and brand reputation. And the marketing? The AI’s insights into the target audience were off, so the ads didn’t resonate with the people who actually might buy this beverage. In the end, the product launch flops – not because the idea was bad or the team lacked talent, but because the data feeding the AI was flawed at multiple points.

This case study also shines a light on the lure – and limits – of synthetic data. To compensate for limited real data, companies often generate synthetic data to simulate consumer behaviour. Synthetic data can be useful, but it’s often too neat and optimistic. By design, it reflects the patterns we expect to see, not necessarily the chaotic reality of human behaviour. In fact, one analysis notes that synthetic data “may not capture the complexity of real-world datasets and can potentially omit important details or relationships needed for accurate predictions” (Dhillon, 2021). That was part of the issue for our snack launch: the synthetic scenarios assumed ideal conditions and typical customer responses, missing the possibility of wild-card events (like a sudden social media trend or a regional taste shift). Those messy real-world nuances were absent, so the AI was essentially flying blind to important factors. The lesson? Synthetic data is a helpful supplement, but it’s no substitute for real, high-quality data. If you only train on a polished, imagined version of reality, your AI will be unprepared for the rough edges of the real world.

The Real AI Power Play is Data

In the end, the message is clear: the real power of AI lies in the data behind it. Sophisticated algorithms and models are now widely available, but what differentiates success from failure is how well an organisation handles its data. Companies that thrive with AI are not necessarily those with the fanciest models, but those with the best data practices. That means having ample, relevant data and ensuring it’s accurate, up-to-date, and truly reflective of the domain. It means breaking down data silos so that AI systems have a complete view of the business. It means investing in data cleaning, integration, and monitoring pipelines to continuously feed models clean, rich information. As AI pioneer Andrew Ng argues, the shift to a “data-centric” approach – focusing on the quality of data fueling AI systems – is crucial to “unlocking [AI’s] full power” (Ng, 2020). In short, if you want better AI, start by improving your data.

For business leaders and decision-makers, this is a call to action. Treat data as a strategic asset – one that deserves at least as much attention and investment as the AI models themselves. Before chasing the next cutting-edge AI tool, make sure your data foundation is solid. Are you collecting the right data? Is it accurate, comprehensive, and unbiased? Do you have processes to fix errors, fill gaps, and update the data over time? Often, making progress with AI isn’t about inventing a new algorithm at all, but about doing the unglamorous work of data curation. The old adage “garbage in, garbage out” holds especially true in the age of AI. An algorithm that ingests bad data will churn out bad insights every time. Conversely, if you feed your AI high-quality data, you set the stage for reliable, transformative results. So while AI tools are becoming easier to acquire (the “easy” part), getting the data right remains hard – but that’s exactly where the real power lies. Master the data, and the AI will follow. After all, an AI is only as smart as the information you give it.


References

  • Dastin, J. (2018). Amazon scraps secret AI recruiting tool that showed bias against women. Reuters.
  • Dhillon, A. (2021). The limits of synthetic data in AI predictions. Journal of Data Science.
  • Forbes. (2020). Why most AI projects fail: And how to avoid it. Forbes Insights.
  • Gartner. (2020). The top reasons AI projects fail. Gartner Research.
  • Ng, A. (2020). Data-centric AI: A new approach to machine learning. Stanford University.

Nigel Yarranton

MBA | Senior Technology Leadership | Stakeholder Management | Solution Consulting | Project Management

2mo

Great article Dharsh, spot on with data piece it’s always the challenge and I have seen this on a number of occasions. Even with good quality data, you still need ensure you have the right AI/ML tools and models to make the best use if the data.

Gurupratap Dsor

Head of Product and Architecture - Simplyai

2mo

Love it Dharsh

Like
Reply

To view or add a comment, sign in

More articles by Dharshun Sridharan

Insights from the community

Others also viewed

Explore topics