Talking Microservices in Healthcare Analytics
Introduction
The write up presents a practical approach to architecting a Healthcare Analytics Solution in cloud. The ideas presented tries to be less opinionated towards a particular cloud technology implementation or practices.
The health care provider has various IT systems On-Premise to support the daily health care operational needs. It’s a mandate by the government to publish operational metrics of such health care facilities in order to efficiently monitor and govern the same.
The solution or idea helps to visualise the operational data generated from hundreds of health care clinics around a country or a region. The data is generated by software from different Vendors. Examples of Healthcare IT systems
1. Electronic Medical Record System
2. ERP System
3. Lab Information system
4. IOT devices
5. Patient Surveys apps
There is a strict mandate to follow data residency laws i.e. the data should reside within the country.
The scope of this write up is to discuss the cloud architecture proposal, Service design principles and NFRs. The On-Premise Healthcare facility IT systems view will be limited to a data generation system.
Functional Requirements
1. Provide tool for Data scientist to collect data for ML or Deep learning purpose
2. Handle OLAP data
3. Key performance Indicator Dashboard for assessing the performance of Healthcare centre
4. Ability to retain historical data
5. Administrative functions
6. Transaction support
7. Distributed data processing
8. Distributed request handling
9. Authentication & Authorisation
Non-Functional Requirements
1. Scalability
2. Consistency & Reliability
3. Security
4. Upgradability
5. Data integrity
6. Interoperability
Cloud Solution Overview
Micro-service-based architecture is chosen for the following reason
1. Solution Centred around wider business capabilities
2. Support modern development methods
3. Clear isolation of services
4. Inherent Continuous integration and deployment support
5. Containerisation is simple when we have micro services
The overall solution components are classified into two and discussed in detail
1. Application Services
2. Big Data Pipeline
Application Services
The application services are broken into Micro services with a mix of cloud enabling or Infrastructural Service and Core Application Service
Key Micro Services
API Gateway
The API Gateway encapsulates the internal system architecture and provides an API that is tailored to each client. It accesses APIs, merging and transforming them into API shapes specifically for client access. It essentially provides a “front end” used to access the micro services underneath. It reduces the number of round-trips between the application and the client. Load balancing, user identity authentication, and authorisation can also be applied at the API Gateway.
Discovery
Provide a REST API for managing service‐instance registration and for querying available instances. We can either use a Self-registration or a Third-party registration. By having a discovery mechanism, all the dependent service location (server : port) are managed by one single service which holds the latest details of all the registered services.
Configuration
Create a Configuration Micro Service to manage all the Services Configuration in one place. This is a useful and cleaner design when moving from Monolithic System to Micro Services. There are commercial solutions like Zookeeper and Consul which helps in maintaining service configuration in one place.
Logging & Auditing
Logging & Auditing helps in giving visibility into the System health and especially in a complex system like a Micro service based. Another important aspect to consider is the number of copy of a single service running in a cluster. Choose a logging & Auditing strategy which generates a distinct Correlation ID that can be used to track a particular request.
Aggregation is an important concept in Logging, where in the logs are collected in a single storage which can be viewed and analysed independently.
Messaging Bus
Eventual consistency is a useful paradigm when considering a distributed micro services-based application where there must be a way to handle back pressure of requests. Since the applications are Asynchronous in nature, a Centralised Message bus is highly embraced which not only improves the overall performance but also gives fault tolerance and traceability. All data consistency may not need eventual consistency hence it has to be chosen based on the synchronous event processing needs as well.
Data Storage & Archive Service
The Long-term data archive drives the Analytics solution part of the overall application. The long-term storage is like the beginning of the data pipeline and this data storage needs to be supplied with a catalog or metadata service for easier search or access of the objects getting stored.
Commercial Examples include, Amazon S3, Google Cloud storage
Performance Monitoring Service
Dashboard to showcase the performance metrics of the Healthcare centres. It ideally connects to a Database to retrieve Aggregated / Computed performance metrics. The Golden or Raw data collected from the hospitals is present in Long term storage for deep learning and the churned metrics data will be stored in a relational database. This data will be timeline based snapshots of computed statistics.
Data-science Helper Service
A service which can expose a simple RESTful API that can connect to the catalog or meta database which helps data scientist to easily search for analytical data. This search result would ideally give all access data points which resolves to the long-term storage URI.
Indexing Service
When the hospitals upload archived data into Long term cloud storage, a parallel index or meta storage could be created to help quickly search the data. The indexing service should choose the right indexing strategy based on the target data type that is getting uploaded.
Data Processing Pipeline Service
The Processing Pipeline is again an application micro service which helps churn the raw data into aggregate data. As a higher-level concept, it is a set of ETL programs designed to run over an incremental / bulk ingested data. This pipeline can be triggered based on data arrival (incremental) or can be externally triggered.
Big Data Pipeline
Storage
The raw data generated by the Healthcare clinics needs to be stored on the cloud for long period of time and should expose interfaces to query the same for analytics needs. Below picture depicts the data pipeline. The typical characteristics of a cloud storage are reasoned based on the points given.
1. Durability & Availability
2. Scalability
3. Secure
4. REST API Interface
5. Cost
Amazon S3, Microsoft Azure & Google cloud storage are examples of commercial object stores
Functional Requirements
Ability to handle variety of data i.e. from different Vendor, Version and Type
Example Philips Tasy & Bahmni OpenMRS
Support long term data store with index
The Solution should accommodate a long-term storage like Amazon S3 & Glacier for deep analytics and also support data indexing for faster search via metadata.
The Index will provide fast search capability for the data scientists and help in collecting data for analytical needs. The platform for running the deep learning algorithms are not in scope of the big data pipeline.
The picture depicts a high level simplified data pipeline. It is conceptually built around eventual consistency model where the raw files are stored in a mass storage platform and processed in batch. The file storage is capable of generating file events such as Create, Delete and Update which is then queued to be processed by a processing service. The processing is typically a micro service with plugin-based approach to handle data variety.
Server less is a good choice for File Processing since it completely supports elasticity.
The NFRs of data pipeline are
1. Consistency
2. Data integrity
3. Capacity
4. Traceability
Conclusion
The Health centre monitoring solution coupled with Big Data Pipeline capabilities helps not only monitor the health of hospitals, but it also opens up an opportunity to do data science over the operational data that is collected from different hospitals. Technically moving from On Premise solution to cloud solution would take careful [re]design of micro services as discussed in the earlier sections.
Ideas like using a metadata database (index) could be challenging in terms of handling the variety of data that gets uploaded from the hospitals to the centralised data cloud.
The cloud solution should try to bring down the dependency of any cloud vendor technology which would risk the overall cost in future. At the same time, cloud vendor support like Server less / Lambda services should be embraced where ever necessary.
The Micro services should be designed in a way that they are Cluster agnostic and try to use proven software load balancers where ever applicable.
Choice of open source frameworks plays an important role which also needs to be evaluated based on the community & commercial support available.