"ETL is dead; long-live streams" is a false statement
If someone in your organization is pushing for real-time processing for everything use this analogy: "We humans eat, then processed food, and then sh.. everything is done in batches... Do not even try to image this in real-time processing (it will look ugly)" real-time processing is great however very often some tasks are required to be batch so it would be wasteful or "ugly" to push for real-time processing.
Some good examples of real-time processing
Some good examples of batch processing
Batch data processing and real-time data processing are two methods of processing data that should be used together to improve the value of a business by using a combination of these methods will help a business make more informed decisions and respond more quickly to changing circumstances.
Recommended by LinkedIn
Going to extremes will be wasteful and ugly. If you hear extremes of one over another in your organization you might have a problem or lack of understanding.
Update 2023-01-06
I have received more attention than I should expect. With both positive and negative feedback.
I would like to clarify one thing: Then talking about batch processing and real-time processing I have a very particular example in my head:
"Manager is coming and asking to build you a quarterly report - but in real time", and here we as data engineers must stand our grant and explain what chasing keywords in real-time is not effective because some reports can't be built in real time.
I do understand my initial text didn't specify this very important peace and it's only me alone who is responsible for it.
Chapter Lead - Data and AI Architecture (AI/ML | Modern Data Architecture | Integration)
2yThe root cause issue is… companies struggle to understand their own use cases, what they are trying to solve primarily. If they fully understood, they would see that most use cases are batch related, not real time needs.
BSc Digital Transformation, Chief Software Architect for Data Integration and Big Data
2yPoint 3: The question for your business is, if you know something, how long does the business process take to make a counter action? If the answer is it takes a day or more today, a follow up question is, if that is desired. Shouldn't the process be optimized instead? A new record is entered in the healthcare system: Toxicological report shows an emergency condition. This data is processed in batch, 4 hours later. A day later. 15 minutes later. Patient died meanwhile. A customer ordered 100l oil for cooking. Very unlikely for a private person. The DWH shows that tomorrow but the barrel has been shipped meanwhile. The sales system marked an order as potentially fraudulent. Tomorrow somebody will decide, that it would have been better to not ship. A little bit late.
BSc Digital Transformation, Chief Software Architect for Data Integration and Big Data
2yPoint 2: Can Realtime tools be as efficient as batch? That is a though one. A join is a good example of why realtime is problematic. Two line items are changed for a sales order. The stream gets two records hence and should output the joined result. Processing the join once per line item will be expensive. Waiting for 15 minutes and microbatch it is realtime enough. If both line items are changed in the same database transaction, then the stream should join them as one. The optimal solution. Efficient and low latency. Event streaming is a good example why realtime is way more efficient than batch. In event streaming you get the changes. In batch you have to ask the source system for the changes. How often do you read the entire source table just to figure out there was no change at all? How long does you longest delta-job run? Multiple hours? That should be called efficient?
BSc Digital Transformation, Chief Software Architect for Data Integration and Big Data
2yPoint 1: If Realtime would be as easy to build and is even more efficient, would you still go for batch processing? According to your analogy, the answer would be No. Please explain the business reasons why you still want to use batch processing.
Follow me for SQL Data Pipelines, Snowflake, Data Engineering, XML Conversion
2yreal time is necessary for some edge cases where you need to make automated, well defined and predefined decisions in real time. Most scenarios and use cases just require batch processing though. Implementing real time for everything is nuts. It is complex, brittle and as a result expensive for no added benefits.