Cloud Architecting with GCP: Part7 – Database Services
As a cloud architect, one of your key responsibilities is to select the right database service based on application needs such as scalability, consistency, latency, and data structure. Google Cloud offers a rich portfolio of purpose-built database services, each designed for specific workloads. This article breaks down the most commonly used options in GCP.
Cloud SQL – Managed Relational Database Service
Cloud SQL offers fully managed relational databases, including MySQL, PostgreSQL, and SQL Server as a service.
It supports automatic replication scenarios, such as from a Cloud SQL primary instance, an external primary instance, and external MySQL instances. You can easily scale up to 96 processor cores and scale out with read replicas.
Cloud SQL supports managed backups, so backed-up data is securely stored and accessible if a restore is required. The cost of an instance covers seven backups.
In HA configuration, within a regional instance, the configuration is made up of a primary instance and a standby instance. Through synchronous replication to each zone's persistent disk, all writes made to the primary instance are replicated to disks in both zones before a transaction is reported as committed. In the event of an instance or zone failure, the persistent disk is attached to the standby instance, and it becomes the new primary instance. Users are then rerouted to the new primary.
Connection type to a Cloud SQL instance
If you’re connecting an application that is hosted within the same Google Cloud project as your Cloud SQL instance, and it is collocated in the same region, choosing the Private IP connection will provide you with the most performant and secure connection using private connectivity. In other words, traffic is never exposed to the public internet.
If the application is hosted in another region or project, or if you are trying to connect to your Cloud SQL instance from outside of Google Cloud, you have 3 options. The recommended is to use the Cloud SQL Auth Proxy, which handles authentication, encryption, and key rotation for you. If you need manual control over the SSL connection, you can generate and periodically rotate the certificates yourself. Otherwise, you can use an unencrypted connection by authorizing a specific IP address to connect to your SQL server over its external IP address.
Cloud Spanner – Globally Distributed Relational Database
Spanner is a fully managed relational database service that scales horizontally, is strongly consistent, and speaks SQL.
Spanner is especially suited for applications that require a SQL relational database management system with joins and secondary indexes, built-in high availability, strong global consistency, and high numbers of input and output operations per second. We’re talking tens of thousands of reads and writes per second or more. Cloud spanner is a service built for the cloud specifically to combine the benefits of relational database structure with non-relational horizontal scale.
This service can provide petabytes of capacity and offers transactional consistency at global scale schemas, SQL and automatic synchronous replication for high availability.
A cloud spanner instance replicates data in end cloud zones which can be within one region or across several regions. This architecture allows for high availability and global placement. The replication of data will be synchronized across zones using Google's global fiber network.
AlloyDB – High-Performance PostgreSQL-Compatible Database
AlloyDB for PostgreSQL is a fully managed, PostgreSQL-compatible database service that's designed for demanding workloads such as hybrid transactional and analytical processing. AlloyDB pairs a Google-built database engine with a cloud-based, multi-node architecture to deliver enterprise-grade performance, reliability, and availability.
AlloyDB also uses adaptive algorithms and machine learning for PostgreSQL vacuum management, storage and memory management, data tiering, and analytics acceleration.
AlloyDB provides fast transactional processing, more than 4 times faster than standard PostgreSQL for transactional workloads.
Firestore – NoSQL Document Database
Firestore is a flexible, horizontally scalable, NoSQL cloud database for mobile, web, and server development. With Firestore, data is stored in documents and then organized into collections. Each document contains a set of key-value pairs.
Firestore client libraries provide live synchronization and offline support and its security features and integrations with Firebase and GCP accelerate building truly serverless apps. Firestore uses data synchronization to update data on any connected device. However, it's also designed to make simple, one-time fetch queries efficiently. It caches data that an app is actively using, so the app can write, read, listen to, and query data even if the device is offline. When the device comes back online, Firestore synchronizes any local changes back to Firestore.
Cloud Firestore also supports ACID transactions so if any of the operations in the transaction fail and cannot be retried, the whole transaction will fail.
Also with automatic multi region replication and strong consistency, your data is safe and available even when disasters strike.
Firestore leverages Google Cloud’s powerful infrastructure: automatic multi-region data replication, strong consistency guarantees, atomic batch operations, and real transaction support.
Recommended by LinkedIn
Operation Modes
Cloud Firestore is actually the next generation of Cloud Datastore.
Cloud Firestore can operate in Datastore mode, making it backwards compatible with Cloud Datastore. By creating a Cloud Firestore database in Datastore mode, you can access Cloud Firestore's improved storage layer while keeping Cloud Datastore system behavior.
Cloud Firestore in native mode introduces new features such as a new, strongly consistent storage layer, a collection and document data model, real time updates, mobile and web client libraries.
Bigtable – Wide-Column NoSQL Database for Massive Scale
Bigtable is Google's NoSQL big data database service. It's the same database that powers many core Google services, including Search, Analytics, Maps, and Gmail.
Bigtable doesn’t support SQL queries, nor does it support multi-row transactions.
Bigtable provides petabytes of capacity with a maximum unit size of 10 megabytes per cell and 100 megabytes per row.
Bigtable is best for analytical data with heavy read and write events, like AdTech, financial, or IoT data.
Memorystore – In-Memory Key-Value Store (Redis & Memcached)
Memorystore is a fully managed in-memory database service supporting Redis and Memcached. It’s ideal for caching and low-latency use cases.
Memorystore also automates complex tasks like enabling high availability, failover, patching, and monitoring. High availability instances are replicated across two zones and provide a 99.9% availability SLA.
BigQuery – Serverless Data Warehouse for Analytics
BigQuery is a serverless data warehouse designed for OLAP (analytical) workloads. It supports SQL-based analytics on petabyte-scale datasets with blazing-fast performance.
The usual reason to store data in BigQuery is so you can use its big data analysis and interactive querying capabilities, but it’s not purely a data storage product.
Decision chart for database services
This chart helps you decide which database service is suitable for your use case:
Conclusion
No single database fits all workloads. The key to successful architecture in GCP lies in choosing the right tool for the job: