Blog

Automate BigQuery reservations and assignments using Dataform

We are excited to share an article that guides using BigQuery’s reservation SQL APIs with Dataform to automate pricing plan adjustments.

The BigQuery editions exposed customers to new pricing models and relatively significant updates to existing models. At DoiT, we noticed that customers struggle with automatically adjusting a project’s reservation based on specific periods, intending to optimize performance and cost.

Problem statement

One of our clients is running a Research & Development project on BigQuery, which is being utilized by both the Development and Data Science teams. The teams have a specific usage pattern with the following characteristics:

    1. Interactive queries occur primarily on weekdays during business hours.
    2. High query concurrency with quick response times.
    3. Particular tasks are scheduled to run during nights or weekends.

Proposed solution

To meet the necessary demands, the project will utilize the ‘On-Demand’ plan during work hours for efficient query latency and high concurrency. During non-working hours, the project will use the ‘Enterprise Edition’ with a maximum of 100 slots.

An automation flow will change the pricing plan accordingly.
 

“r

Automation of reservation and assignment changes using Dataform

Today, Google offers no out-of-the-box solution to automatically switch between BigQuery Pricing. However, they provide an interface to manage reservations via CLI or SQL.

In the following, we will use Dataform and its SQLX language to enhance the efficiency of updating pricing plans. This approach enables us to seamlessly manage any modifications in reservations and assignments within a Dataform workflow, which is also version-controlled in Git.

Another benefit of using Dataform is its simplicity: All the work is completed in SQL within BigQuery without requiring external services. Dataform is free, and there is no additional cost associated.

In the following, we will use Dataform and its SQLX language to enhance the efficiency of updating pricing plans. This approach enables us to seamlessly manage any modifications in reservations and assignments within a Dataform workflow, which is also version-controlled in Git.

Another benefit of using Dataform is its simplicity: All the work is completed in SQL within BigQuery without requiring external services. Dataform is free, and there is no additional cost associated.

BigQuery reservation and assignment management

As a preparation, we’ll (manually) create an ‘Enterprise Edition’ reservation test_rdwith a max slot configuration of 100.

Next, to manage the reservation configuration and assignment changes, we’ll use the BigQuery Reservation API (SQL DDL)
SQL Code example of creating an assignment in the US region using a reservation called ndv-rd, with the assignment_id being df_rd.

SQL Code example of creating an assignment in the US region using a reservation called ndv-rd, with the assignment_id being df_rd.

CREATE ASSIGNMENT `ndv-playground-bq-mgmt.region-us.ndv-rd.df_rd`
OPTIONS( assignee="projects/ndv-playground", job_type="QUERY");

Build Dataform repository and workspace

If you’re unfamiliar with Dataform, I suggest reading this blog, Build SQL pipelines to BigQuery with Dataform and do the ‘Quick start.’

You can access all the code related to the Blog from this GitHub repository

There are six steps to building an automated workflow:

[1] Create a repository and grant roles

To create a Dataform repository in the BigQuery admin project, navigate to the Dataform section and initiate the repository creation process. (Create a Dataform repository)

To ensure successful execution, both the developer principal (user) and the Dataform SA (service account) require the following roles during workflow execution: bigquery.resourceAdmin, bigquery.user

To assign roles to the SA (code below), use the SA ID displayed in the message prompt after creating a repository.
 
“dataform

The SA of Dataform

gcloud projects add-iam-policy-binding <PROJECT-ID> \
--member="<service-account-dataform>" \
--role=roles/bigquery.resourceAdmin --condition=None
gcloud projects add-iam-policy-binding <PROJECT-ID> \
--member="<service-account-dataform>" \
--role=roles/bigquery.user --condition=None

[2] Create a development workspace

To create a new Dataform development workspace, follow these steps: Create a Dataform workspace.

Once initialized, the workspace will include several SQLX example methods and a configuration file named “dataform.json.” These example files (first_view.sqlx, second_view.sqlx) can be removed.

To configure the Dataform methods add the section “vars“ which contains the list of parameters shown below. This section should appear after the last existing parameter and include the below parameters. In this section, you set the default values for each parameter, which can be overridden in the workflow at a later stage.

  1. Make sure to add a comma  "," before the “vars” section.
  2. Replace parameters marked with "<>" (The"<>" should be removed)
"vars": {
"assigned_project":"'<R&D project>'",
"Assignment_id":"`df_rd_assignment`",
"reservation_prefix":"<BQ admin project>.region-<reservation location>",
"reservation_name":"`<R&D reservation name>`",
"region":"<reservation location>"
}

This is an example of our implementation:

"vars": {
"assigned_project":"'ndv-playground'",
"Assignment_id":"`df_rd_assignment`",
"reservation_prefix":"`ndv-playground-bq-mgmt.region-us`",
"reservation_name":"`ndv-rd`",
"region":"region-us"
}

[3] Develop SQLX methods

In this step, we will create the SQLX code to handle reservations and assignments.

  • Create a file named drop_and_create_assignment.sqlx under the 'definitions' folder.
  • Please include the following SQLX code to delete all project assignments and create the necessary assignment.
  • Verify it’s compiled successfully by having a ✅:

This is the SQLX code of drop_and_create_assignment.sqlx

config {type: "operations"}
BEGIN

CREATE TEMP TABLE to_drop (drop_string STRING) AS (
SELECT
CONCAT("DROP ASSIGNMENT IF EXISTS `",project_id,".",SPLIT(ddl,".")[OFFSET(1)],
       ".",reservation_name,".",assignment_id,"`")
FROM
${dataform.projectConfig.vars.region}.INFORMATION_SCHEMA.ASSIGNMENTS_BY_PROJECT
WHERE
assignee_id = ${dataform.projectConfig.vars.assigned_project} 
AND job_type = 'QUERY' );

FOR drop_statement IN (
SELECT drop_string FROM to_drop)
DO EXECUTE IMMEDIATE drop_statement.drop_string;
END FOR;

END;

This is the expected generated code according to our configuration

BEGIN
CREATE TEMP TABLE to_drop (drop_string STRING) AS (
SELECT
CONCAT("DROP ASSIGNMENT IF EXISTS `",project_id,".",SPLIT(ddl,".")[OFFSET(1)],".",reservation_name,".",assignment_id,"`")
FROM
region-us.INFORMATION_SCHEMA.ASSIGNMENTS_BY_PROJECT
WHERE
assignee_id = 'ndv-playground' AND job_type = 'QUERY' );
FOR drop_statement IN (
SELECT drop_string FROM to_drop)
DO EXECUTE IMMEDIATE drop_statement.drop_string;
END FOR;
END;

CREATE ASSIGNMENT
`ndv-playground-bq-mgmt.region-us`.`ndv-rd`.`df_rd_assignment`
OPTIONS(
assignee=CONCAT("projects/",'ndv-playground'),
job_type="QUERY");

[4] Test and verify the SQLX methods

Before you run to test the code, pls note that the assigned projects configuration will be change according to the value in the  dataform.json file. You may want to consider testing it on a ‘test’ project.

To execute and test the code, you have two options. You can use the ‘Run’ button with your credentials or choose ‘Start Execution’ to execute it with Dataform SA, as mentioned in the message below.

Execute all actions, or select a subset of actions. Service account [email protected] will be used.

Verify according to the execution output: This statement created a new assignmentand then verify the changes in the ‘Capacity management’ view.
 
“capacity

Once the SA verification is successful, you can proceed to commit the changes and push them to the default branch. At this point, your development is complete, and you are ready for release.

[5] Create Dataform release configuration

To create a snapshot of the code and execute it with overridden parameters, use the release configuration. We will only configure one parameter to simplify things and set the refresh to Never.
Please, remember to compile after each change.

The R&D project will have two releases, one for the ‘On-Demand’, with reservation_name= `none`and another for the ‘Enterprise edition’, reservation_name=`ndv-rd`(pay attention to the "`").

Test the “release configuration” with the selected action:
ndv-playground-bq-mgmt.dataform.drop_and_create_assignment
 
“release

Release for Enterprise reservation

“Release

Release for On-demand reservation

[6] Create Dataform workflow configuration

The workflow configuration outlines how SQLX actions are automatically executed. A “release configuration” and its parameters are the basis for each workflow. The selected actions are then run based on a specific trigger.

The R&D project has two workflows that utilize different release configurations and are triggered at other times. Time starting is defined using the 'cron'format.
 
“Enterprise

Enterprise edition workflow

Summary

We’re excited to see the benefits that AutoScaling and ‘Editions’ have brought to the table! However, we’ve noticed that some users have encountered a hurdle — the inability to assign it to specific periods. We understand that this can be especially frustrating, given its cost.

This article is part of our effort and commitment to finding solutions and enhancing our customer’s experience with these features.

Dataform’s simplicity, sole operation within BigQuery, and cost-effectiveness make it a valuable tool for implementation.

Subscribe to updates, news and more.

Leave a Reply

Your email address will not be published. Required fields are marked *

Related blogs

Connect With Us