ETL Migration
As developers, we always view a data flow horizontally – Like a train. We start from the engine, which initiates the flow, and finish it with the Guard Van, which is our target database. In between, we will have different kinds of Coaches - AC / First Class / Sleeper / Unreserved & Pantry. Technically, we see the flow from source to target, with intermediate files creation.
I was involved in a Data Warehouse migration project – On Prem to cloud. The goal was to move all the Trains (ETL Jobs) from one station (On Prem) to another (Cloud). These trains need to be lifted and shifted from one station and the strategic decision was, NOT TO MOVE the trains as a whole, rather similar Coaches across various trains were moved together.
In one wave, all the engines and AC Coaches were moved and in another wave, all the sleeper coaches and unreserved coaches were moved. Technically - Jobs performing similar activities across ETL landscape were migrated together. There is nothing wrong in migrating in this pattern. However, the problem started when some of the intermediate jobs, which were identified as redundant, needed to be decommissioned.
The migration team realized that in some trains, the Pantry car was not needed and decommissioned them. However, in the new station they did not connect the coaches, which were on either side of Pantry car. Because of this intermediate decommissioning, the logical flow was broken in the new station and the entire train was not getting pulled in new station
Conclusion – In whichever way we may migrate the ETL jobs, validation of job dependency post migration needs to be done for which having an updated ETL job flow document handy is of paramount importance while we embark on such migration journeys.
Architecting enterprise applications, Java, J2ee, Angular, Cloud, Serverless, Alexa, micro services, containerization, mobile technologies.
4yNicely written Udankar! Explanation by example made it clear to understand. Visualizing the data flow and building blocks is important in any data migration along with early thought process.
Director of Data Engineering | Expert in Delivery Leadership | Program Management Professional Driving Data-Driven Success
4yYou made it sound so simple with your train/coach analogy. But rightly pointed out the dependencies and the challenges with integration. Engaging QA early on the planning phase and trying to visualize on how this will be tested post migration would help.
Solution Architect / Cloud Data Engineering / ETL
4yExcellent narrative to understand a complex scenario in simple terms.
Business Excellence Practitioner
4yVery well written and subtly explained thru a nice analogy; the ETL migration method is very relevant and applicable in similar projects across industries.
Data & Analytics Evangelist.
4yLesson #1. You don't pull out a pantry out of a running train. MIssing coaches are bound to happen. You were lucky not to remove the engine as redundancy. It's not a a validation problem in my mind but little or no impact analysis during design. As they say when you fork lift don't use spoon.