LAMBDA ARCHITECTURE
Lambda Architecture is a famous term since the Big Data has surfaced. It is based on Lambda Calculus and is a basic framework for any Decision Support Systems.
Lambda calculus (also written as λ-calculus) is a formal system in mathematical logic for expressing computation based on function abstraction and application using variable binding and substitution. It is a universal model of computation that can be used to simulate any Turing machine. [Wikipedia]
‘Lambda Architecture is about flowing data from the places it’s generated till end users, in both Batch and Real-Time manner.’
There are two approaches to process data i.e., Batch and Real-Time
· Batch Processing Layer is when data is processed after certain time frame which can be daily, half day, hourly and as low as 10 minutes or even 1 minute. Means, data is processed or extracted, not based on when its generated rather when it can be pulled based on data available or systems resources to process it or manual updates by intermediary parties etc.
· Real-Time Processing Layer is about processing or extracting data, the moment its generated. We also call it Even Based Processing.
Recommended by LinkedIn
Lambda Architecture is not new. It’s been in-use since decades but in past as we were only banking on structured data from OLTP systems plus due to the lack of tools and technologies, we were only able to process or move data from source to target datastore in batches. The most in-use batch timing used by most of the organizations is Daily Batch where data is moved once in 24 hours, which was and still is at many places, transferring after office hours like from 8 pm till 8 am. So, the reporting is based on D-1 (1-day old data). Real-Time was also used but at very minimum, due to the tool and technologies been very expensive.
Now, since the inception of Big Data, due to open-source tools, due to Hadoop, due to Object Storage, due to NoSQL databases and last but not the least due to tools, technologies and storages getting so cheap, Lambda Architecture has really come into limelight.
In current era, every organization has or already started to adopt Lambda Architecture as a norm. Currently, organizations with Regulatory challenges are mostly using Hadoop in on-premises environment. In Cloud AWS, Azure and Google are the major players. Oracle, IBM, and many other Cloud providers are also in the race.
One of the main reasons for organizations not able to adopt Cloud, is Data Sovereignty (explained in separate topic). Due to this requirement, Cloud provider’s data center must reside in customer’s country or at least in the same region. AWS, Azure, and Google are the fast-growing Cloud providers as they almost already have Data Centers in every region and now, they are targeting to get into major countries.
Every Organization has mapped its tool on Lambda Architecture, let me take this opportunity to share standard architectures of AWS, Azure and Google Clouds.
Cheers.