Managing mixed workloads in a database can be challenging. Data engineers and analysts need access to up-to-date information to make business decisions. However, accessing this data as and when they want can impact application and user workloads. Cloud spanner solves this with data boost. Google Cloud Spanner is a scalable, highly available, and globally distributed database service. It is a relational database designed to handle large-scale transactions and delivers strong consistency and high availability across regions. It offers features such as ACID transactions, standard SQL supports, and automatic scaling of servers and storage.

Google cloud spanner

Cloud Spanner Data Boost is a fully managed, serverless service provided by Google Cloud Platform. With Cloud Spanner Data Boost you can run the analytical workload anytime without impacting users and increasing the instance size. Because the spanner separates compute and storage, queries that enable data boost will use compute resources entirely separate from what the spanner instance is configured with no competition for CPU or memory with the transactional workload.

BigQuery Federated queries to use Data Boost

Here, we will go through an example with a database that contains a game’s data that persisted in Spanner. This data contains
player information and the history of items bought and sold on a trading post. The data engineer is tasked with analyzing trends in items listed and for how much. This analysis would help the company create or modify events in the game based on the actions of the players, tailoring the experience for those currently playing the game. The engineer set up a BigQuery Federated query for Spanner. This allowed the engineer to have the freshest data on demand.

Cloud Spanner Data Boost

However, they noticed bursts of CPU on the spanner instance that were impacting the overall player experience. To mitigate this, they have two choices increase the size of the Spanner instance or use Data Boost. Increasing the spanner instance size is either a one-time or specific period when the data engineer can run their queries. And using Data Boost is simple.

Google cloud platform

They set up a new Federated query connection in BigQuery to use Data Boost. Updating the query to use the new connection solved the issue. The players are no longer impacted because the Data Boost queries do not consume CPU from the Spanner instance. The data engineer can now safely run queries at any time. They only pay for the computes used when the query runs.

Spanner Client Connector to use Data Boost

BigQuery Federated queries are not the only way to use Data Boost. Data Boost can be used with any Spanner Client Connector. So you can use Data Boost in your ETL pipelines that leverage Dataflow, Dataproc, or other custom applications. For example, the data engineer has set up an ETL process to anonymize player information and export hourly batches of item sales to share with third-party partners using Dataproc and GCS.

Google Cloud spanner

With Data Boost, this ETL process no longer consumes the Spanner instance CPU, leaving that capacity for operational workloads. Data Boost works for any query that takes advantage of the partition query API to read a large amount of data for analytical workloads.

Data Boost Security

You control who has access to Data Boost using IAM permissions by granting the users the use of DataBoost permission. This permission only grants Data Boost usage permissions. So, all existing control and data governance rules your database owners have set up are still enforced. Queries that use Data Boost can be tracked using cloud metrics and audit logs down to the query and user granularity. In addition to the benefits of auditing Data Boost queries, you can set usage limits on Data Boost by user or query for granular cost controls.

Conclusion

With Cloud Spanner Data Boost, it is easy to run batch and analytical workloads on your production Spanner database without impacting other database operations. In addition to workload isolation, when you use Data Boost, those batch and analytic queries may be completed faster since they are not competing for compute resources. Also, enable data sharing for everybody and not worry about DBAs going off.

We the team of Metclouds Technologies help you to simplify the execution of analytics workloads.