Heard about 3 V’s of Data. What about 3 D’s of Data?
How are you all Data Stalwarts doing? Having fun with Datalake or you think Datalake is not keeping its promise? I have seen amazing success stories and amazing failures of data journey of several organizations. Well, it would not be fair to blame anyone for Datalake failures since Datalake along with it brought so many technologies & tools & approaches that many organizations and engineers either couldn’t cope up with it or didn’t have the holistic perspective to do it right. Many orgs & leaders have started to even think whether Big Data was just a hype created or can it still add value. Well answer is both Yes and No. Question is – Have you done it right?
I gather that in most of the cases - there is one common reason that led to success or failure – Data Governance, the most undermined paradigm of Big Data journey. I have seen Data Stewards & engineers trying to address Data Governance with meetings, checklists, signoffs, and eventually nagging. My Dear Friends - without robust Data Governance, you wouldn’t know what is growing inside the Datalake, leave alone deriving insights out of it. With the burst of Data related technologies, comes unforeseen complexity, security, compliance, data management, & operational challenges, which if not dealt properly can derail your entire data driven enterprise strategy before you know it. And these problems continue to grow even worst with ever evolving architectures & patterns such as hybrid environments, multi cloud architectures, federated data lakes, data mesh, data fabric, data virtualization, DaaS etc. As a result, the very definition of Data Governance has been extended and now demands - Data Security, Policies Compliance, Lineage, Catalog, Quality, Semantic Layer Management, Active metadata, Knowledge Graphs, Datalake management, monitoring, centralized automation frameworks, automated data classification, auditing, automate ETL Ingestion & Testing, Universal Semantic Layer and so on. Already undermined paradigm has only grown more complex. So, what’s the solution?
Say Hello to 3 Ds of Data: DataGovOps, DataOps & DevSecOps.
The Trio together aims to implement Data Governance with three-pronged strategy – Automation, Multi-Level Abstraction and Centralization. Automation in ETL ingestion & testing, CI/CD, auditing, security, quality can accelerate TTM and reduce operational overhead & costs significantly. Multi-level Abstraction can help to build universal semantic layer, business glossary, active metadata catalog, knowledge graphs, common business language & data democratization. Finally, Centralization can help in managing the Data Hub from a single plane of glass and having centralized framework services for security, audit & quality to support complex architecture with multitude of technologies. See figure below, depicting the evolving technologies & tools landscape:
Recommended by LinkedIn
With such vast & evolving ecosystem of technologies, you can’t just pick up technologies stack and do A/B testing to see whether it works or not. Do your due intelligence to see what fits in your Technology roadmap by assessing the business requirements, use cases, workload types, volumetrics, POCs, weighted scorecard etc. There is no single solution that fits all scenarios. Embrace Automation in your Architecture using 3 D's and deriving insights from data will become a sport.
Happy Coding, Happy Architecting and Happy DATing!!!
Vice President | India Capability Leader | India Country Board Member | Master Architect | Digital Transformation | Passionate Technologist | Diversity & Inclusion Custodian | Trainer | Mentor
3yVery insightful and you nailed down the key requirement for a successful big data implementation.
Lead Enterprise Data Architect | Data Mesh | Cloud Data Architecture | TOGAF® 9 Certified
3yData governance is key in Data Lake architecture. Really helpful article.