ETL to Data Virtualization
ETL- Extract, Transform, Load
DV- Data Virtualization
ETL has ruled industry over 1 decade but still complex for enterprise where enterprise has federated data sources. Data virtualization is one step ahead.
The concept of Data virtualization has become a hot topic recently as a promise for better, faster, more flexible and improved operations in the Business Intelligence world. ,if DV done properly,is similar to the Views for databases. A data virtualization layer is a 'view' for federated data sources.
These data sources could be:
- Data structures: RDBMS, XML, JSON.
- API or query languages: SOAP,Rest,SQL.
- Location: On-premise,cloud
Data virtualization software is often used in tasks such as:
-Various Data Integration
-Business Integration
-Service-oriented Architecture Data Services
-Enterprise Search
Definition(wiki):
Data virtualization is an approach to data management that allows an application to retrieve and manipulate data without requiring technical details about the data, such as how it is formatted at source, or where it is physically located,and can provide a single customer view (or single view of any other entity) of the overall data.
Many organizations run multiple types of database management systems, such as Oracle and SQL servers, which do not work well with one another. Therefore, enterprises face new challenges in data integration and storage of huge amounts of data. With data virtualization, business users are able to get real-time and reliable information quickly, which helps them to take major business decisions.
Generally enterprise landscapes are filled with disparate data sources including multiple data warehouses, data marts, and/or data lakes. The traditional approach is extract, transform, load ("ETL") process, the data remains in place, and real-time access is given to the source system for the data. Data virtualization may also be considered as an alternative to ETL and data warehousing.
The process of data virtualization involves abstracting, transforming, federating and delivering data from disparate sources. The main goal of data virtualization technology is to provide a single point of access to the data by aggregating it from a wide range of data sources. This allows users to access the applications without having to know their exact location.
Data virtualization is inherently aimed at producing quick and timely insights from multiple sources without having to embark on a major data project with extensive ETL and data storage.Data Virtualization is a layer to combine real-time data from disparate data sources and make data available to businesses without sharing any technical aspects .
Although there are many tools for DV(IBM,TIBCO, Informatica,Red Hat,Oracle,SAP, SAS,etc) but One of the pioneer tool for Data virtualization is Denodo. It works on the "Three Cs principle.(3C)"
Connect — Connect to any data source (e.g. database, files, APIs etc.).
Combine — The motive is to gather data from multiple sources and combine them to fulfill a business requirement, this layer is meant to cater exactly to that purpose. This layer define the data transformation and combination to meet business requirements.
Consume — Finally real-time data available to the data consuming platform. Denodo supports a variety of ways to expose data to consumers (e.g. web services,JDBC/ODBC interface).
Pros of Virtualization:
Zero Replication
Uses Hardware Efficiently.
More agility to change
Available at all Times.
Recovery is Easy.
Quick and Easy Setup.
Cloud Migration is Easier.
Cons of Virtualization:
High Initial Investment.
Data Can be at Risk.
Quick Scalability is a Challenge.
Performance Witnesses a Dip.
Unintended Server Sprawl.
Thanks:
For any query/suggestion contact to cnsnoida@gmail.com