Microservice Transformation Journey - CDC Driven Real time Data Sync
Strangler pattern is commonly adopted to incrementally migrate an existing, monolithic, application by replacing a set of features to a microservice but keep both running in parallel. Applying a domain driven design approach, you may strangle the application using bounded context. But then as soon as this pattern is applied, you need to assess the coexistence between existing bounded contexts and the new microservices.
During this journey, a couple of challenges arise where the write and read operations occurs, and how data should be replicated between these old and new sub-systems. For instance, when a legacy application is migrated to a modern system, it may still need existing legacy resources. New features must be able to call the legacy system. This is especially true of gradual migrations, where different features of a larger application are moved to a modern system over time.
Often these legacy systems suffer from quality issues such as convoluted data schemas or obsolete APIs. The features and technologies used in legacy systems can vary widely from more modern systems. To interoperate with the legacy system, the new application may need to support outdated infrastructure, protocols, data models, APIs, or other features that you wouldn't otherwise put into a modern application.
Different database engine and different data semantics
Implement a façade or adapter layer between monolithic and microservice subsystems that don't share the same semantics. This layer translates requests that one subsystem makes to the other subsystem. Use this pattern to ensure that an application's design is not limited by dependencies on outside subsystems
Data corruption possibility.
How do you ensure new data doesn’t corrupt the old data and maintain transaction data integrity? During the transformation journey, a wide variety of domain data will be migrating to different points in time and some may persist in legacy, some in MS and some even in both.
- Customer records (Monolithic DB)
- Transaction records (MS DB)
- Workflow (MS DB)
- Audit log (MS and Monolithic DB)
- Reports (MS and Monolithic DB)
- ...
Fortunately, there are some solution options to address these concerns.
Solution 1 - Anti-corruption layer.
Maintaining access between new and legacy systems can force the new system to adhere to at least some of the legacy system's APIs or other semantics. When these legacy features have quality issues, supporting them "corrupts" what might otherwise be a cleanly designed modern application.
One generic solution is to isolate these different systems by placing an anti-corruption layer between them. This layer translates communications between the two systems, allowing one system to remain unchanged while the other can avoid compromising its design and technological approach.
Here Legacy App calls to MS services via an anti-corruption layer. Communication between Legacy App and the anti-corruption layer always uses the data model and architecture of Legacy App. Calls from the anti-corruption layer to MS Apps conform to that respective data model or methods. The anti-corruption layer contains all of the logic necessary to translate between the two systems. The layer can be implemented as a component within the application or as an independent service.
Issues and considerations at hand
Couple of side effects that we needs to be aware of
- No Real time data replication : The anti-corruption layer may add latency to calls made between the two systems. For instance, if there is batch for every minute that pushes data
- The anti-corruption layer adds an additional service that must be managed and maintained.
- Performance impact to existing application as changes need like db triggers or application layer blocking changes calls
- Does this anti-corruption layer scale?
- Strong coupling with the snapshot will imply complex migration during every component migration and its operation life.
- Consider whether you need more than one anti-corruption layer. You may want to decompose functionality into multiple services using different business units, different countries or languages, or there may be other reasons to partition the anti-corruption layer.
- Data Integrity issue : What happens when a customer calls for a transaction order cancelation event is consumed before the event it cancels!.
- Prolonged Production Downtime during migration
- Burden on the change impact on the monitoring, release, and configuration processes?
- Consider whether the anti-corruption layer needs to handle all communication between different subsystems, or just a subset of features.
- If the anti-corruption layer is part of an application migration strategy, consider whether it will be permanent, or will be retired after all legacy functionality has been migrated.
- Huge legacy code base, missing events may occur
- High costs Database triggers : Run with each DML, high impact on performance and scalability
This is where CDC Driven Real time Data Sync architecture helps
Solution 2 - CDC Driven Real time Data Sync
At its core, it leverages the Change Data Capture (CDC) component. CDC is a solution that captures change events from a database transaction log (or equivalent mechanism) and forwards those events to downstream consumers. CDC ultimately allows application state to be externalized and synchronized with external stores of data.
Change Data Capture implementations usually have these characteristics:
- External process that reads the transaction log of a database with the goal to materialize change events from those transactions
- Change events are forwarded to downstream consumers as messages
The key trade off is that it’s simply externalizing the transaction log of the database as a stream of events to interested consumers. An easy way to share critical transactions stored in enterprise databases in real time is by using streaming data integration with DB e.g. Oracle CDC. The log-based CDC method reads the database transactions in Oracle redo logs and collects the insert, update, delete operations as soon as the transactions commit; without impacting the performance of source systems.
CDC tool: Debezium is a new open source project that implements CDC. As of this writing, it supports pluggable connectors for Oracle, MySQL and MongoDB PostgreSQL Designed to persist and distribute the stream of events to CDC clients, i’s built on top of well-known and popular technologies such as Apache Kafka to persist and distribute the stream of events to CDC clients.
Debezium fits very well in data replication scenarios such as those used in microservices architectures. You can plug the Debezium connector into your current database, configure it to listen for changes in a set of tables, and then stream it to a Kafka topic. Debezium messages have an extensive amount of information, including the structure of the data, the new state of the data that was Change Data Capture
CDC also gives you flexibility on how events are consumed.
Option 1
Here we have a standalone CDC process to capture and forward events from the transaction log to a message broker as below
Option 2
Here we have an embedded CDC client that sends events directly to an application
Illustration:
Option 3 Outbox Pattern
The primary goal of the Outbox Pattern is to ensure that updates to the application state (stored in tables) and publishing of the respective domain event is done within a single transaction. This involves creating an Outbox table in the database to collect those domain events as part of a transaction. Having transactional guarantees around the domain events and their propagation via the Outbox is important for data consistency across system.
When a distributed transaction is not supported by the messaging middleware (like current Kafka version), it is important to ensure consistency between the records in the database and the events published. With this approach we can publish to the topic as soon as an order is received via the API and then the same code is consuming this events to persist to the database.
With this approach if write to the topic operation fails, the application can return an error to the user, if the write operation to the database fails, the code can reload from the non-committed record
After the transaction completes, the domain events are then picked up by a CDC connector and forwarded to interested consumers using a reliable message broker Those consumers may then use the domain events to materialize their own aggregates
CDC implementation often has these characteristics:
- A durable message broker is used to forward events with at-least-once delivery guarantees to all consumers
- The ability to replay events from the datastore transaction log and/or message broker for as long as the events are persisted
This pattern has the following benefits:
- The service publishes high-level domain events
- CDC is very flexible and adaptable for multiple use cases.
- No 2PC required
- High scalability The asynchronous nature of a message bus makes this strategy highly scalable. We don’t need to handle throttling because the message consumers can handle the messages at their own pace. It eliminates the possibility of a producer overwhelming the consumer by sending a high volume of messages in a short period of time.
- Development velocity
- Scalability
- Cost-of-change reduction
- Ability to build multiple databases concurrently
- Future proof : Go beyond the database. Synchronize data from APIs, Webhooks & Events.
Sample Architecture: Multi source, Debezium, Confluent Kafka and MongoDB
Reference