Copying data between your own tables in BigQuery has become much easier in recent years. But there are still many ways to do it, each with different limitations that can get confusing. I’ll describe each, with pluses and minuses.
Copying across data regions used to be complicated.
Copy through Storage
You could copy from BigQuery to Cloud Storage, then back again to BigQuery in the other region. This not only has that additional complicating step, but it can get expensive too. You also had to orchestrate it yourself, for example, with Composer for scheduling, buffering, and parallelization.
Recently, the Dataset Copying feature has made this much easier. The feature works across regions and is free, except for network costs. Scheduling and related features are built-in.
Intra-region copying with
bq, the Job API, or Copy Tables
Copying inside a region has always been easier, and best of all, free. Within the region, you can run
bq cp from the command line, code against the Job API, or “Copy Tables” in the Cloud Console lets you do this as well.
The Scheduled Queries feature is another way to copy inside a region. Though not free, it is quite flexible, allowing you to write any
SELECT statement for insertion to a target table. And as the name suggests, it has scheduling built-in.
Of the approaches I mentioned, only
bq cp and Scheduled Queries support one common use case, daily incremental backup of an ingestion-time partitioned table. Only these can copy just the last day’s partition and preserve this as a partition in the target table.
Here are your choices in table form:
You need to populate your Google Spreadsheet with data.
|Approach||Across Regions?||Preserve ingestion-time partitions with a daily copy?||Scheduling built-in?||Free?|
|Copy Job API (various languages)||No||No||No||Yes|
|Dataset Copying||Yes||No||Yes||Yes (except network costs)|
|Copy Table (Console)||No||No||No||Yes|
|BQ to Storage to BQ||Yes||No||No||No|