AI & Agentic-ready Data Platforms: A Roadmap for 2025
Point of view, as of January 25
The era of traditional Business Intelligence (BI) platforms as we know them is really coming to an end. As we move into 2025, more than supporting other use cases, organizations must reconceptualize their data platforms to support both AI and agentic systems. This transformation is not just about technology – it represents a fundamental shift in how we think about, organize, and utilize data
The End of Traditional BI-Centric Platforms
Traditional BI platforms, designed primarily for reporting and analytics, are no longer sufficient. In fact, BI is expected to represent less than 50% of data platform usage in the near future. The new paradigm requires platforms that can handle both structured and unstructured data in near real-time, with a strong emphasis on centralized semantic layer and active data management and observability.
5 Key Pillars of Modern Data Platforms for AI & Agentic
The next chapters will dig into into the five key pillars of modern data platforms for AI and agentic systems:
1. Centralized Semantic Layer: The Foundation for AI and Agents
The semantic layer has emerged as the cornerstone of modern data platforms, particularly for AI agents and LLMs. Unlike traditional BI implementations where semantic definitions were trapped within specific visualization tools, the new semantic layer serves as a universal translator between human intent and data structures.
Why Semantic Layer is Critical for AI Agents
AI agents, powered by LLMs, primarily interact through natural language. The semantic layer bridges the gap between these text-based interactions and the underlying data structures by:
Universal Data Access Through Semantic Understanding
The semantic layer transcends its traditional role in BI to become a universal knowledge graph that connects data across domains, a central repository of business logic and metric definitions, an interpreter between natural language and technical implementations, and a guarantor of consistent data interpretation across all tools and platforms.
This universality means that whether data is being accessed by any business user using natural language (and it works!), an AI agent answering to queries, an automated process making decisions, or a data scientist building models, the same semantic understanding, business rules, and data relationships are applied consistently.
Breaking Free from Tool-Specific Semantics
Historical approaches where semantic definitions were embedded within specific tools (like BI platforms) created inconsistent interpretations across different systems, duplicate definitions and business logic, barriers to AI adoption and automation, and limited ability to scale data understanding across the organization.
The new semantic layer solves these challenges by centralizing semantic definitions outside of any specific tool, making business context available as a service, enabling dynamic adaptation of data understanding, and supporting multi-modal data interpretation for AI systems.
2. Becoming Active: From Passive Storage to Operational Hub
The transformation from passive BI platforms to active operational data platforms represents a fundamental shift in how organizations leverage their data assets. Traditional data lakes served primarily as static repositories for historical analysis, but modern platforms must become dynamic operational hubs that actively participate in business processes.
Beyond the Analytics-Only Paradigm
Traditional BI platforms were characterized by:
Modern and active data platforms break this paradigm by fundamentally transforming the way data flows and is processed. They enable bi-directional data flows, allowing for seamless interaction between different systems and applications. Additionally, they support real-time data processing, which facilitates immediate insights and decision-making.
Furthermore, modern active data lakes directly power operational decisions, providing the necessary data and analytics to inform business actions. Ultimately, they serve as a central nervous system for business operations, integrating and orchestrating various processes to drive efficiency and effectiveness.
Operational Capabilities and Requirements
The shift to operational data platforms demands:
Near Real-Time Processing
Streaming or Near Real-Time processing capabilities are essential for handling high-volume, high-velocity data streams in today's operations. This enables the platform to process and analyze data as it arrives, allowing for timely insights and decision-making. Event-driven architectures can also help, as they enable the platform to react to specific events or changes in the data, triggering actions or notifications as needed. CDC or Data movement solutions (as Fivetran / HVR or Airbyte) can help bypass traditional ETL and batched operations.
Bi-directional Integration
Reverse ETL capabilities are essential for reinjecting data back into operational systems, ensuring that insights and analytics are not only generated but also acted upon in real-time. This capability enables the active data lake to not only consume data but also to influence operational processes directly.
An API-first architecture is critical for facilitating interaction between the active datalake and various systems within the organization. This approach ensures that data and analytics are easily accessible and can be integrated into different applications and services, fostering a culture of data-driven decision-making.
Automated data sharing mechanisms are necessary for streamlining the exchange of data between different systems and applications. By automating this process, organizations can reduce manual data transfer errors, increase efficiency, and ensure that data is shared in a secure and controlled manner.
Operational Use Cases
Direct integration with customer-facing applications is a key use case for active data lakes. This integration allows for real-time data sharing and analysis, enabling businesses to provide personalized experiences to their customers.
Real-time personalization engines are another important application. These engines use real-time data to tailor content and experiences to individual users, increasing engagement and satisfaction.
Dynamic pricing and inventory management is a use case that benefits from real-time data. By analyzing market conditions and customer behavior in real time, businesses can optimize pricing and inventory levels to maximize revenue and customer satisfaction.
Impact on Business & Technical Operations
This transformation enables immediate action on insights rather than delayed responses, automated decision-making based on real-time data, seamless integration between analytical and operational processes, dynamic business process optimization, and real-time customer experience personalization.
When making your Data Platform more operational, you really must understand that your platform is becoming a real operational component of your Information Systems, requiring careful monitoring and operation surveillance, as any other active component in your IS. Traditional BI teams could not be used to this level of requirements :
3. Domain-Driven Architecture: Breaking Free from Traditional Constraints
The shift from traditional three-tier/medallion architectures (Bronze/Silver/Gold or Raw/Refined/Curated) to a domain-driven approach represents a fundamental rethinking of how we organize and manage enterprise data platforms. This transformation is essential for enabling AI and agentic systems at scale.
Limitations of Traditional Architectures
The traditional three-tier architecture creates several challenges:
Organizational Bottlenecks
Centralized teams are prone to becoming overwhelmed with requests, leading to a slow response to business needs. This is exacerbated by the limited domain expertise within these central teams, making it challenging to effectively address the unique needs of each business domain. Furthermore, the coordination between business and IT stakeholders becomes increasingly complex, hindering the ability to respond quickly to changing business requirements.
Technical Constraints
The traditional architecture is characterized by rigid data models that are difficult to evolve in response to changing business needs. The complex dependencies between layers make it challenging to modify or update individual components without affecting the entire system. Additionally, the architecture's inability to optimize for specific use cases results in a one-size-fits-all approach to data transformation, which can be inefficient and ineffective.
Scalability Issues
As the organization grows, the central team resources scale linearly, leading to increased coordination overhead and difficulty in maintaining data quality at scale. Moreover, the traditional architecture's limitations make it challenging to parallelize development, hindering the organization's ability to scale efficiently and effectively.
Domain-Driven Data Products Approach
The new approach organizes data around business domains, where each domain:
Ownership and Autonomy
Clear domain ownership and accountability are essential, ensuring that each domain is responsible for its actions and outcomes. This autonomy allows for decision-making within domains to be self-sufficient, enabling local optimization tailored to the specific needs of each domain. This approach ensures direct alignment with business objectives, as each domain is focused on achieving its unique goals.
Data Product Thinking
Data is treated as a product, complete with clear Service Level Agreements (SLAs) that outline its quality, availability, and performance. Well-defined interfaces and contracts ensure seamless integration and consumption of data products. Built-in data quality and observability mechanisms guarantee the data product's integrity and facilitate its monitoring. The primary focus is on understanding user needs and consumption patterns to ensure the data product meets its intended purpose.
Architecture Principles
The architecture is designed to facilitate the independent evolution of domains, allowing each domain to progress at its own pace without being hindered by dependencies on other domains. Clear boundaries and responsibilities are established to avoid confusion or overlap between domains. Standardized inter-domain communication ensures seamless interaction between domains, promoting a cohesive and integrated system. A federated governance model is implemented to oversee the entire system, ensuring consistency and coordination across domains.
Recommended by LinkedIn
Enabling Scale Through Domain Architecture
This approach enables scaling by:
Parallel Development
Independent domain teams work autonomously, reducing coordination overhead and enabling faster iteration cycles. This autonomy allows for domain-specific optimization, ensuring that each domain is tailored to its unique needs and objectives.
Clear Responsibilities
Domain teams are responsible for their data products, ensuring that they are aligned with the specific needs of their domain. Central teams focus on developing platform capabilities, providing a foundation for the domains to build upon. Standardized interfaces between domains facilitate seamless integration, while a shared governance framework ensures consistency across the system.
Flexible Evolution
The domain-driven approach enables domains to evolve at different speeds, allowing for independent technology choices where appropriate. This flexibility enables experimentation within domains, fostering innovation and improvement. A gradual modernization path is also facilitated, ensuring that domains can adapt to changing requirements without disrupting the entire system.
Implementation Considerations
Success with domain-driven architecture requires:
Organizational Alignment
Clear domain boundaries must be established and aligned with the organization's business objectives. This ensures that each domain is focused on specific business needs and outcomes. Empowered domain teams are essential, as they are responsible for making decisions and taking actions within their respective domains. A balanced distribution of responsibilities between central and domain teams is crucial, ensuring that each team is accountable for its actions and outcomes. Strong communication channels must be established to facilitate collaboration and coordination between teams, ensuring that information flows seamlessly across domains.
Technical Standards and tooling
To ensure seamless integration and data exchange between domains, shared data contracts must be defined and agreed upon.
Standard integration patterns should be established to simplify the integration process and reduce complexity. Common quality metrics are necessary to ensure that data products meet the required standards across domains.
A unified metadata management system is essential for maintaining a consistent understanding of data across the organization.
Modern Data Governance is not only about rules and powerpoint anymore but also in robust tooling and platform involvement to help engineers and business users work seamlessly with Data.
Data Platforms internal Marketplaces become more and more a standard to package and deliver Data Products to internal stakeholders, as it could be for external users.
Governance Framework
A federated governance model is necessary to oversee the entire system, ensuring consistency and coordination across domains. Clear decision rights must be defined to avoid confusion or overlap between domains. Standardized quality measures are essential to ensure that data products meet the required standards across domains. Cross-domain coordination mechanisms must be established to facilitate collaboration and ensure that domains work together effectively.
This approach creates a more resilient, scalable, and agile data platform that can better support the diverse needs of modern enterprises, particularly in the context of AI and agentic systems deployment.
My point of view
I want to make it clear that the solution is not really in the adoption of any data modelling techniques (entity-relation model, star schema, snowflake schema and other data vaults - which seems to bring today more problems than solving issues), but more in the data architecture reusing digital platforms design, where pizza teams, APIs and agility were foundational to bring real time-to-value.
4. From Data Quality to Data Observability
The increasing prevalence of near real-time data and massive volumes has rendered traditional data quality approaches ineffective. To address these challenges, modern data platforms must incorporate advanced capabilities that ensure data quality and integrity. Specifically, these platforms require end-to-end visibility across the entire data value chain, enabling the tracking of data from its source to its final destination. This visibility is crucial for identifying and addressing data quality issues promptly.
Automated quality testing and anomaly detection are also essential components of modern data platforms. These features enable the identification of data quality issues in real-time, allowing for swift corrective action to be taken. Furthermore, advanced monitoring capabilities are necessary to proactively identify potential issues before they impact the data pipeline. This proactive approach ensures that data quality issues are addressed before they affect downstream applications or users.
Comprehensive lineage and impact analysis are also critical components of modern data platforms. These capabilities provide a detailed understanding of how data flows through the system, enabling the identification of the root cause of data quality issues and the impact of these issues on downstream applications. This understanding is essential for making informed decisions about data quality and for optimizing data processing workflows to ensure the highest levels of data integrity.
If you really want to know more about Data Observability, take a look at pure players like Sifflet Data (my favourite, yes it’s french) or Monte Carlo Data.
5. Your unstructured Data is just … Data
In the era of AI and Agentic systems, the distinction between structured and unstructured data is becoming increasingly irrelevant. AI agents, with their advanced capabilities, can seamlessly process and analyze both types of data, extracting valuable insights and making informed decisions. Therefore, organizations should adopt a unified approach to data management, treating unstructured data with the same level of importance and rigor as structured data. The recent acquisition of Datavolo by Snowflake shows how platform prepare for this approach.
Why Unify Data Management?
AI agents are agnostic to data structures. They can leverage advanced techniques like natural language processing (NLP) and machine learning (ML) to extract meaning and value from both structured and unstructured data. By unifying data management, organizations can unlock the full potential of their data assets, enabling AI agents to leverage all available information for decision-making.
How to Achieve Unified Data Management
By adopting a unified approach to data management, organizations can empower their AI agents to leverage the full spectrum of their data assets, driving innovation and unlocking new opportunities for growth.
Preparing for AI and Agentic Systems
Infrastructure Requirements
Universal Storage: The data platform of 2025 must be capable of efficiently handling both structured and unstructured data. This is essential to accommodate the diverse data types and sources that modern enterprises deal with.
For structured format, 2025 will certainly see Iceberg becoming the new open de facto standard for multi/hybrid-cloud structured data storage.
Only advanced Data Platforms can guarantee a universal and always governed Data Storage for both structured and unstructured data.
Near Real-Time Processing: The ability to access and process data in near real-time is a critical requirement for the data platform of 2025. This capability enables organizations to make timely and informed decisions based on the most current data.
Semantic Understanding: The data platform of 2025 must possess advanced semantic understanding capabilities. This includes the ability to understand the context and relationships between different data elements. This understanding is crucial for enabling more intelligent and context-aware data processing.
Governance Evolution
The governance model is evolving from a centralized to a federated model, where:
The IT department is responsible for managing the platform infrastructure, ensuring its stability and scalability.
Individual domains are granted a high degree of autonomy, allowing them to make decisions that are best suited to their specific needs and objectives. Shared governance rules are established to ensure consistency and alignment across all domains, promoting a cohesive and integrated system. Mandatory data cataloging and quality reporting are enforced, ensuring that all data products meet the required standards and are well-documented.
Clear data contracts are established between domains, ensuring that data is exchanged in a standardized and secure manner.
To ensure Modern Governance is not only rules and powerpoints, it must be enforced into the core of your data platform. This is called : Federated Computational Governance which includes :
The Road Ahead
To sum-up what we've just been through, organizations must focus on several key areas to prepare their data platforms for AI and agentic systems:
Industrialization
Data Synchronization
Accessibility
Conclusion
The transition to AI and agentic-ready data platforms represents a fundamental shift in enterprise data architecture. Success requires organizations to move beyond traditional BI-centric thinking and embrace a more dynamic, interconnected, and semantically rich approach to data management. The platforms of 2025 will not just store and analyze data – they will actively participate in the organization's AI and agentic ecosystem, enabling new levels of automation, insight, and innovation.
Data & AI | Sales Leader
3moSuper intéressant, merci Laurent. Cc Renaud Andrieux Nasri BADAOUI
Data & AI | Sales Leader
3moBENOIT GRIMAUD
Snowflake ❄️ Addict | @ Devoteam | Data & AI fan
3moI have to credit Sami Amouri for his contribution on the Data Governance part and Moshir MIKAEL and @aws teams for discussions around this topic !
Chief Revenue Officer | MBA | MSC
3moAlways good to read you Laurent LETOURMY and spot on article. I totally agree realtime data ingestion, transformation and distribution is key to the ecosystem. Not sure if you have seen but I have joined DiffusionData few months ago and we specialise in exactly that. I was very excited to see we already have built that CDC adapter for a number of years already to help with streaming data from legacy system. I will ping you to exchange on the topic as this is super interesting!
EMEA Strategic Partner Manager at Amazon Web Services (AWS)
3moVery insightfull article! thanks Laurent LETOURMY for sharing with us these valuable learning !