ETL to Data Virtualization

ETL to Data Virtualization

ETL- Extract, Transform, Load

DV- Data Virtualization

ETL has ruled industry over 1 decade but still complex for enterprise where enterprise has federated data sources. Data virtualization is one step ahead.

The concept of Data virtualization has become a hot topic recently as a promise for better, faster, more flexible and improved operations in the Business Intelligence world. ,if DV done properly,is similar to the Views for databases. A data virtualization layer is a 'view' for federated data sources. 

These data sources could be:

- Data structures: RDBMS, XML, JSON.

- API or query languages: SOAP,Rest,SQL.

- Location: On-premise,cloud

Data virtualization software is often used in tasks such as:

-Various Data Integration

-Business Integration

-Service-oriented Architecture Data Services

-Enterprise Search

Definition(wiki):

Data virtualization is an approach to data management that allows an application to retrieve and manipulate data without requiring technical details about the data, such as how it is formatted at source, or where it is physically located,and can provide a single customer view (or single view of any other entity) of the overall data.

Many organizations run multiple types of database management systems, such as Oracle and SQL servers, which do not work well with one another. Therefore, enterprises face new challenges in data integration and storage of huge amounts of data. With data virtualization, business users are able to get real-time and reliable information quickly, which helps them to take major business decisions.

Generally enterprise landscapes are filled with disparate data sources including multiple data warehouses, data marts, and/or data lakes. The traditional approach is extract, transform, load ("ETL") process, the data remains in place, and real-time access is given to the source system for the data. Data virtualization may also be considered as an alternative to ETL and data warehousing.

The process of data virtualization involves abstracting, transforming, federating and delivering data from disparate sources. The main goal of data virtualization technology is to provide a single point of access to the data by aggregating it from a wide range of data sources. This allows users to access the applications without having to know their exact location.

Data virtualization is inherently aimed at producing quick and timely insights from multiple sources without having to embark on a major data project with extensive ETL and data storage.Data Virtualization is a layer to combine real-time data from disparate data sources and make data available to businesses without sharing any technical aspects . 

Although there are many tools for DV(IBM,TIBCO, Informatica,Red Hat,Oracle,SAP, SAS,etc) but One of the pioneer tool for Data virtualization is Denodo. It works on the "Three Cs principle.(3C)"

Connect — Connect to any data source (e.g. database, files, APIs etc.).

Combine — The motive is to gather data from multiple sources and combine them to fulfill a business requirement, this layer is meant to cater exactly to that purpose. This layer define the data transformation and combination to meet business requirements.

Consume — Finally real-time data available to the data consuming platform. Denodo supports a variety of ways to expose data to consumers (e.g. web services,JDBC/ODBC interface).

Pros of Virtualization

Zero Replication

Uses Hardware Efficiently.

More agility to change

Available at all Times. 

Recovery is Easy. 

Quick and Easy Setup. 

Cloud Migration is Easier.

Cons of Virtualization

High Initial Investment. 

Data Can be at Risk. 

Quick Scalability is a Challenge. 

Performance Witnesses a Dip. 

Unintended Server Sprawl.

Thanks:

For any query/suggestion contact to cnsnoida@gmail.com

To view or add a comment, sign in

More articles by Chandrashekhar Kumar

  • HBase: King of Low-Latency Reads

    HBase Components are HMaster Manages metadata & monitors RegionServers. RegionServer Handles reads/writes, stores data…

    2 Comments
  • Spark 4.0 Preview: Performance & Streaming

    Spark 4.0 is now available for preview, introducing exciting new features in PySpark and streaming capabilities.

  • Polars vs pandas

    Polars is a fast DataFrame library, similar to Pandas, but optimized for performance, especially when working with…

    1 Comment
  • AI: A Blessing or Bhasmasura(भस्मासुर)Curse?

    In the last five years, the IT industry has increasingly focused on AI and similar tools like ChatGPT, Gemini, etc. In…

    2 Comments
  • Building a Budget-Friendly DEV Env on LAN

    Thanks for reading my previous post: How Docker Development Setup Can Significantly Reduce Costs https://www.linkedin.

  • How Docker Development setup Can Significantly Reduce Costs for Businesses

    In today's world, physical installations of software on machines are becoming a thing of the past. With the rise of…

    1 Comment
  • Trained Resource Crunch and Indian Universities/Engineering colleges

    This article is dedicated to our Indian universities. I often contemplate the balance between intelligence and…

    1 Comment
  • Back ground verification in Indian IT industry

    First of all, I believe background verification is a mandatory part of the hiring process and should be conducted to…

  • Magic of Snowflake (part1)

    Now a days snowflake (data warehouse tool) is in trend for DWH analytics. Snowflake is an SaaS(Software-as-a-Service)…

    1 Comment
  • Interview Process in IT Industry

    The way interviews are taken in the IT industry, personally, not relevant any more. This is totally my personal…

Insights from the community

Others also viewed

Explore topics