Batching Jobs in GCP using the Cloud Scheduler and Functions

While designing and implementing solutions, I am often faced with the need to set up recurring batch jobs around data storage and processing. Recently I have been trying to keep my infrastructure as serverless as possible so in this article, I will show you how Google Cloud Platform can be leveraged to run almost any batch job your project might need for free.

Use Cases

For me, this batch pattern is the most useful when it comes to data processing, reconciliation, and cleanup. Here is an example involving data aggregation…

A bucket can be an effective repository for streaming data but if your payloads are small in size and frequent — having a file for every payload can get expensive if you have to do frequent reads. I solve this problem by running a batch job to merge individual payloads into hourly or daily files, allowing for much more cost effective solution.

Or how about database cleanup…

If you have a SQL database containing large timeseries data sets, regular purging is critical for performance. You can squeeze a recurring job into a web application or the ETL system that is loading data into your tables however I solve for this using this serverless batch approach to decouple the solution and simplify maintenance.

Architecture

We will be using 3 GCP services to implement our serverless batch solution. The Cloud Scheduler will trigger our batch events, Pub/Sub will be used to transmit the events to a Cloud Function that will perform the required batch operation.

Diagram by author

Pricing

GCP offers a very generous free tier, I have made a simplified cost table below for the three services we will need to schedule and run a batch job. A batch job running every 5 minutes will use up 1 cloud scheduler job, ~9,000 Cloud Function executions, and ~9MB of Pub/Sub throughput.

Diagram by author

If you need more than 3 jobs across your projects, you will be charged 10 cents USD per month for every additional job.

Configuration

Pub/Sub

First, let’s configure a Pub/Sub topic as it will be required in our set up of both the scheduler and serverless function. Topics can be configured here in your GCP console.

Diagram by author
  • As you can see all we have to configure is the topic name -  I am using example-topic

Cloud Scheduler

Next, we configure the Cloud Scheduler as our batch trigger. Head here and create your scheduler job.

Diagram by author
  • In my example, I am configuring the scheduler to run every 30 minutes but you can set any period desired
  • Specify the topic we created in the previous step — I am using example-topic
  • Our cloud function will not need anything except the trigger from the scheduler so the payload value is not important so you can put any value —  I am using run

Cloud Function

We are almost done, the last step is to create a Cloud Function that will be triggered when an event is triggered by the scheduler. You can find Cloud Function configuration here.

Diagram by author
  • Select Cloud Pub/Sub as the trigger type
  • Select the topic created in the first step
  • Proceed to the code configuration — I will be using the out of the box Node.js function. The function simply logs the contents of the Pub/Sub payload.

The function might take a minute to fully deploy.

Keep in mind that you can use any of the available programming languages in this step.

Testing

To test our configuration we need to head over the Cloud Scheduler list and manually trigger our scheduler using the RUN NOW option.

Diagram by author

To make sure our function successfully triggered we can head over to our Cloud Function list, select the function you configured earlier and check out the Logs tab.

Diagram by author

You should see log output indicating that your function ran and the payload messaged you configured for the Cloud Scheduler should also be displayed.

Success!

Conclusion

There you have it, in 5 minutes we configured a 100% free solution that you can use to run various types of batch jobs. If you ever find yourself in need of quickly setting up a highly decoupled solution for kicking off or running batch jobs you now have a quick and easy way to get it done using GCP.

Good luck and happy coding!