ETL to Data Virtualization

Chandrashekhar Kumar

Published Feb 5, 2021

ETL- Extract, Transform, Load

DV- Data Virtualization

ETL has ruled industry over 1 decade but still complex for enterprise where enterprise has federated data sources. Data virtualization is one step ahead.

The concept of Data virtualization has become a hot topic recently as a promise for better, faster, more flexible and improved operations in the Business Intelligence world. ,if DV done properly,is similar to the Views for databases. A data virtualization layer is a 'view' for federated data sources.

These data sources could be:

- Data structures: RDBMS, XML, JSON.

- API or query languages: SOAP,Rest,SQL.

- Location: On-premise,cloud

Data virtualization software is often used in tasks such as:

-Various Data Integration

-Business Integration

-Service-oriented Architecture Data Services

-Enterprise Search

Definition(wiki):

Data virtualization is an approach to data management that allows an application to retrieve and manipulate data without requiring technical details about the data, such as how it is formatted at source, or where it is physically located,and can provide a single customer view (or single view of any other entity) of the overall data.

Many organizations run multiple types of database management systems, such as Oracle and SQL servers, which do not work well with one another. Therefore, enterprises face new challenges in data integration and storage of huge amounts of data. With data virtualization, business users are able to get real-time and reliable information quickly, which helps them to take major business decisions.

Generally enterprise landscapes are filled with disparate data sources including multiple data warehouses, data marts, and/or data lakes. The traditional approach is extract, transform, load ("ETL") process, the data remains in place, and real-time access is given to the source system for the data. Data virtualization may also be considered as an alternative to ETL and data warehousing.

The process of data virtualization involves abstracting, transforming, federating and delivering data from disparate sources. The main goal of data virtualization technology is to provide a single point of access to the data by aggregating it from a wide range of data sources. This allows users to access the applications without having to know their exact location.

Data virtualization is inherently aimed at producing quick and timely insights from multiple sources without having to embark on a major data project with extensive ETL and data storage.Data Virtualization is a layer to combine real-time data from disparate data sources and make data available to businesses without sharing any technical aspects .

Although there are many tools for DV(IBM,TIBCO, Informatica,Red Hat,Oracle,SAP, SAS,etc) but One of the pioneer tool for Data virtualization is Denodo. It works on the "Three Cs principle.(3C)"

Connect — Connect to any data source (e.g. database, files, APIs etc.).

Combine — The motive is to gather data from multiple sources and combine them to fulfill a business requirement, this layer is meant to cater exactly to that purpose. This layer define the data transformation and combination to meet business requirements.

Consume — Finally real-time data available to the data consuming platform. Denodo supports a variety of ways to expose data to consumers (e.g. web services,JDBC/ODBC interface).

Pros of Virtualization:

Zero Replication

Uses Hardware Efficiently.

More agility to change

Available at all Times.

Recovery is Easy.

Quick and Easy Setup.

Cloud Migration is Easier.

Cons of Virtualization:

High Initial Investment.

Data Can be at Risk.

Quick Scalability is a Challenge.

Performance Witnesses a Dip.

Unintended Server Sprawl.

Thanks:

For any query/suggestion contact to cnsnoida@gmail.com

To view or add a comment, sign in

ETL to Data Virtualization

Chandrashekhar Kumar

More articles by Chandrashekhar Kumar

Insights from the community

Others also viewed

Data Virtualization 2.0: ETL’s doppelgänger rising again?

Data Virtualization: Strategies for a 'Zero ETL' Future

Beyond ETL: Designing Data Pipelines That Adapt, Scale, and Empower

Delta Architecture in ETL: A Practical Guide to Modern Data Processing

How to Optimize ETL Pipelines: Best Practices for Data Engineers

Evolution of E.T.L. Tools: Understanding the Shift in Data Integration

Data Integration patterns for ML/Data Engineers

ADF

Process your Data

Unlocking Data Integration Possibilities: Datalake, Data Warehouse, and DataLakeHouse in ETL

Explore topics

More articles by Chandrashekhar Kumar

HBase: King of Low-Latency Reads

Spark 4.0 Preview: Performance & Streaming

Polars vs pandas

AI: A Blessing or Bhasmasura(भस्मासुर)Curse?

Building a Budget-Friendly DEV Env on LAN

How Docker Development setup Can Significantly Reduce Costs for Businesses

Trained Resource Crunch and Indian Universities/Engineering colleges

Back ground verification in Indian IT industry

Magic of Snowflake (part1)

Interview Process in IT Industry