ETL TESTING

ETL TESTING

ETL — Extract/Transform/Load — is a process that extracts data from source systems, transforms the information into a consistent data type, then loads the data into a single depository. ETL testing refers to the process of validating, verifying, and qualifying data while preventing duplicate records and data loss.

ETL testing ensures that the transfer of data from heterogeneous sources to the central data warehouse occurs with strict adherence to transformation rules and is in compliance with all validity checks. It differs from data reconciliation used in database testing in that ETL testing is applied to data warehouse systems and used to obtain relevant information for analytics and business intelligence.

Eight stages of the ETL testing process

Effective ETL testing detects problems with the source data early on—before it is loaded to the data repository — as well as inconsistencies or ambiguities in business rules intended to guide data transformation and integration. The process can be broken down into eight stages.

  1. Identify business requirements — Design the data model, define business flow, and assess reporting needs based on client expectations. It’s important to start here so the scope of the project is clearly defined, documented, and understood fully by testers.
  2. Validate data sources — Perform a data count check and verify that the table and column data type meets specifications of the data model. Make sure check keys are in place and remove duplicate data. If not done correctly, the aggregate report could be inaccurate or misleading.
  3. Design test cases — Design ETL mapping scenarios, create SQL scripts, and define transformational rules. It is important to validate the mapping document as well, to ensure it contains all of the information.
  4. Extract data from source systems — Execute ETL tests per business requirement. Identify types of bugs or defects encountered during testing and make a report. It is important to detect and reproduce any defects, report, fix the bug, resolve, and close bug report — before continuing to Step 5.
  5. Apply transformation logic — Ensure data is transformed to match schema of target data warehouse. Check data threshold, alignment, and validate data flow. This ensures the data type matches the mapping document for each column and table.
  6. Load data into target warehouse — Perform a record count check before and after data is moved from staging to the data warehouse. Confirm that invalid data is rejected and that the default values are accepted.
  7. Summary report — Verify layout, options, filters and export functionality of summary report. This report lets decision-makers/stakeholders know details and results of the testing process and if any step was not completed i.e. “out of scope” and why.
  8. Test Closure — File test closure.

A final step is for the ETL tester to test the tool, its functions, and the ETL system.

Nine types of ETL tests

ETL testing fits into four general categories: new system testing (data obtained from varied sources), migration testing (data transferred from source systems to data warehouse), change testing (new data added to data warehouse), and report testing (validate data, make calculations).

ETL Tests that may be executed in each stage are:

CategoryETL TestsNew System Testing— Data quality testing— Metadata testingMigration Testing— Data quality testing — Source to target count testing — Source to target data testing — Performance testing — Data transformation testing — Data integration testingChange Testing— Data quality testing — Source to target count testing — Source to target data testing — Production validation — Data integration testingReport Testing— Report testing

  1. Production validation, also called “production reconciliation” or “table balancing,” validates data in production systems and compares it against source data. This guards data against faulty logic, failed loads, or operational processes that are not loaded to the system.
  2. Source to target count testing verifies that the number of records loaded into the target database match the expected record count.
  3. Source to target data testing ensures projected data is added to the target system without loss or truncation, and that the data values meet expectations after transformation.
  4. Metadata testing performs data type, length, index, and constraint checks of ETL application metadata (load statistics, reconciliation totals, data quality metrics).
  5. Performance testing makes sure that data is loaded into the data warehouse within expected time frames and that the test server response to multiple users and transactions is adequate for performance and scalability.
  6. Data transformation testing runs SQL queries for each row to verify that the data is correctly transformed according to business rules.
  7. Data quality testing runs syntax tests (invalid characters, pattern, case order) and reference tests (number, date, precision, null check) to make sure the ETL application rejects, accepts default values, and reports invalid data.
  8. Data integration testing confirms that the data from all sources has loaded to the target data warehouse correctly and checks threshold values.
  9. Report testing reviews data in summary report, verifying layout and functionality are as expected, and makes calculations.

Testing during the ETL process can also include user acceptance testing, GUI testing, and application migration tests to ensure the ETL architecture performs well on other platforms. Incremental ETL tests can verify that new records and updates are processed as expected.

ETL testing challenges

Identifying challenges early in the ETL process can prevent bottlenecks and costly delays. Creating a source-to-target mapping document and establishing clear business requirements from the start is essential. Frequent changes to requirements—requiring ETL testers to change logic in scripts—can significantly slow progress. ETL testers need to have an accurate estimation of the data transformation requirements, the time it will take to complete them, and a clear understanding of end-user requirements. A few other challenges to watch out for from the beginning include:

  • Data that is lost or corrupted during migration.
  • Limited availability of source data.
  • Underestimating data transformation requirements.
  • Duplicate or incomplete data.
  • Large volume of historical data that makes ETL testing in target system difficult.
  • Unstable testing environment.
  • Outdated ETL tools in use.

 

How to find the best ETL testing tool

ETL testing tools increase IT productivity and simplify the process of retrieving information from big data to gain insights. The tool itself contains procedures and rules for extracting and processing data, eliminating the need for traditional programming methods that are labor-intensive and expensive.

Another benefit is that ETL testing tools have built-in compatibility with cloud data warehouse, ERP and CRM platforms such as Amazon Web Services, Salesforce, Oracle, Kinesis, Google Cloud Platform, NetSuite, and more.

Capabilities to look for when comparing ETL testing tools include:

  • Graphical interface to simplify the design and development of ETL processes.
  • Automatic code generation to speed development and reduce errors.
  • Built-in data connectors that can access data stored in file format, a database, packaged application, or legacy system.
  • Content management facilities that enable context switching for ETL development, testing and production environments.
  • Sophisticated debugging tools that let you track data flows in real-time and reports on row-by-row behavior.

Cloud-native ETL tools designed specifically for cloud computing architecture enable a business to reap the full benefits of a data warehouse endeavor.

Open source ETL testing

ETL testing is a multi-level, data-centric process. It uses complex SQL queries to access, extract, transform and load millions of records contained in various source systems into a target data warehouse. ETL testing tools handle much of this workload for DevOps, eliminating the need for costly and time-intensive development of proprietary tools.

Extensive ETL testing gives an enterprise confidence in the integrity of its big data and the business intelligence gained from that data, and lowers business risk. Talend Open Studio for Data Integration is an industry-leading leading open source ETL development and testing tool. With millions of downloads since 2006, it is free to use under an Apache license.

Subscription-based Talend Data Integration includes the same ETL testing functionality as well as enterprise class continuous delivery mechanisms to facilitate teamwork, run ETL testing jobs on remote systems, and an audit tool for qualitative and quantitative ETL metrics.

To view or add a comment, sign in

More articles by Smriti Saini

  • What Is Portfolio Analytics?

    The term portfolio analytics may be interpreted and implemented in many different ways. The first order of business…

  • Annuity

    An annuity is a series of payments made at equal intervals. Examples of annuities are regular deposits to a savings…

  • What is Actuarial Modeling?

    Actuarial modeling is the name for a set of techniques used in the insurance industry. These models are composed of…

    1 Comment
  • Supervised vs. Unsupervised Learning: What’s the Difference?

    The world is getting “smarter” every day, and to keep up with consumer expectations, companies are increasingly using…

  • APACHE HIVE

    Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data query and…

  • Acceptance testing

    In engineering and its various subdisciplines, acceptance testing is a test conducted to determine if the requirements…

  • SAP HANA

    SAP HANA (high-performance analytic appliance) is an in-memory, column-oriented, relational database management system…

  • Machine Learning Architecture

    Introduction to Machine Learning Architecture Machine Learning architecture is defined as the subject that has evolved…

  • AZURE DEVOPS

    What is Azure DevOps? Azure DevOps is a Software as a service (SaaS) platform from Microsoft that provides an…

  • Report Building

    Elemental development means high productivity for report developers. To enable end-users to see, understand and act…

Insights from the community

Others also viewed

Explore topics