Data Ecosystem for Real-Time Analytics - Part 1
In a startup, the most exciting and challenging thing is to constantly evaluate the technology and business trend and predict the development direction of the industry based on our understanding and intuition. We keep exploring on - Why do we need Real-Time Analytics? What is event stream processing? Why do we need a streaming database? Can stream processing really replace batch processing?
In this article, we will focus on fundamental building blocks of a data ecosystem that can support Real-Time Analytics.
1. Event streaming: Streaming data is data that is continuously generated and delivered rather than processed in batches or micro-batches. This is often referred to as “event data,” since each data point describes something that occurred at a given time.
The systems capture data in real-time from sources (or Producers) like databases, sensors, and cloud services in the form of event streams and deliver them to other applications, databases, and services (or Producers). It pertains to data storage, and stores data based on a timestamp. This is often handled by technology such as Apache Kafka and Amazon Kinesis.
In-memory stream processors stand out because of their ability to process large amounts of streaming data very quick. These systems can scale (Apache Kafka at LinkedIn supports over 7 trillion messages a day) and handle multiple, concurrent data sources and event streaming in real-time.
Recommended by LinkedIn
2. Event Stream Processing: It is the practice of taking action on a series of data points that originate from a system that continuously creates data. Traditional Batch processing is about taking action on a large set of static data (“data at rest”), while Event Stream processing is about taking action on a constant flow of data (“data in motion”).
Since events are also referred to as messages, the entire system can also be referred to as a “messaging system”. These are usually the technology that helps developers write applications that take action on the events. Actions that are taken on those events include
Use cases such as payment processing, fraud detection, anomaly detection, predictive maintenance, and IoT analytics all rely on immediate action on data. All of these use cases deal with data points in a continuous stream, each associated with a specific point in time.
Event stream processing is also valuable when data granularity is critical. The practice of Change Data Capture (CDC), in which all individual changes to a database are tracked, is another event stream processing use case. In CDC, downstream systems can use the stream of individual updates to a database for purposes such as identifying usage patterns that can help define optimization strategies, as well as tracking changes for auditing requirements.
Part 2 - will include Real-Time Analytic Database, Zero Trust Security, Testing, Architecture Patterns
Tech Keynote Speaker | TOGAF Enterprise Architect | Artificial Intelligence | Robotics Learner | Data Science | Salesforce Certified-Agentforce & Data Cloud | Analytics | Digital Transformation | IT Governance | DevOps
1yGreat Article Padma Purushothaman. In Salesforce landscape and platform, CDC and Event Bus patterns are extensively used. Further, for a rollback/callback feature, Mulesoft(IPaaS) is clubbed with it for a robust enterprise solution.