In this post, we compare approaches for getting insights in your BigQuery footprint and present you a custom Python script for full flexibility.
Introduction
In the rapidly evolving world of data analytics, Google BigQuery stands out as a powerful, serverless data warehouse that enables super-fast SQL queries across vast datasets. Whether youโre a data scientist, data engineer, or analytics expert, navigating through BigQueryโs extensive capabilities can often feel like uncovering hidden treasures. As your organisation grows, your BigQuery footprint might grow with it, and one of the challenges that many users face is efficiently listing tables and datasets across different scopes.
This blog post aims to guide you through the various methods of listing tables and datasets in BigQuery, with the specific (final aim) to list these assetsย across your whole organisationย and highlight the inherent limitations of each approach along the way.
As we delve into the different techniques, from basic GUI interactions to more advanced SQL queries and API calls, youโll gain insights into their strengths and constraints. In the end, we will introduce a final solution that offers the greatest flexibility but will require more legwork โ that is, if we didnโt already do the work for you!
We will run through the following options:
- Google Cloud Console web UI
- BigQuery CLI
- Dataplex (Data Catalog)
- INFORMATION_SCHEMA
- Custom script
And consider a few evaluation dimensions along the way:
- Scope of listing
- Level of detail included
- Flexibility
- Price
Google Cloud Console web UI
A good place to start is the familiar Google Cloud Console web UI. Navigating to BigQuery Studio under the BigQuery product page gives you datasets and tables in your current project (and other projects if you added them) laid out in theย Explorer pane. It also gives you the ability to search for resources (dataset, table, or view) by name or label and returns results that span bothย projectsย as well asย organisationsย you might have access to.

Pros:
- Itโs easily accessible from BigQuery Studio.
- It can search overย
projectsย andยorganisationsย and thus has no limitations in scope. - This approach is free.
Cons:
- It doesnโt provide an exhaustive list of all the datasets or tables in your organisation or that you have access to.
- It doesnโt provide any other metadata about your tables.
BigQuery CLI
If you are not a huge fan of UIs and prefer to spend your time in the command line, you might be more tempted by the BigQuery CLI.
Given a PROJECT_ID, you can list all the datasets in that project by issuing:
bq ls --project_id $PROJECT_ID

Or, given both PROJECT_ID & DATASET_ID, you can list all the tables in that dataset by issuing:
bq ls --project_id $PROJECT_ID --dataset_id=$DATASET_ID

Pros:
- Easy to use CLI.
- Output can be used for lightweight downstream processing with e.g.ย
jq, especially when using theย--format: <none|json|prettyjson|csv|sparse|pretty>ย flag. - Metadata operations are free.
Cons:
- The scope of your listing is limited: your scope is a project for the dataset level and a dataset for the table level.
- The details returned in the listing are again limited.
Dataplex (Data Catalog)
When thinking about finding all the relevant data in your organisation, obviously Data Catalog comes to mind! Data Catalog used to be a stand-alone product, but is aย feature of Dataplex since mid-2022. Dataplex is Google Cloudโs intelligent data management platform that automates the organisation, security, and analysis of data across data lakes, warehouses, and databases, helping your organisation with discoverability, governance and compliance.ย Data Catalogย acts as the central inventory of your organisationโs data assets.
It allows you to search across organisations and systems, filter for data types, tags, etc. For our use case specifically, we are able to retrieve all the datasets & tables within a specific organisation simply by using theย appropriate filter in the UI:
The nice thing is that, for the people who prefer CLI over UI, we can achieve the same thing using aย gcloudย command as follows:
For datasets:
gcloud data-catalog search "type=dataset" --include-organization-ids=YOUR_ORG_ID

For tables:
gcloud data-catalog search "type=table" --include-organization-ids=YOUR_ORG_ID

Pros:
- Searches across organisations AND goes beyond only searching BigQuery.
Cons:
- Data Catalog stores different types ofย metadataย (business as well as technical metadata), but misses some details like table size or total bytes stored.
- Neither Dataplex nor Data Catalog is free, although the pricing should be manageable for most use cases. See theย pricing pageย for more details.
INFORMATION_SCHEMA
The BigQuery aficionados among us will definitely be familiar withย INFORMATION_SCHEMA. The BigQuery INFORMATION_SCHEMA views are read-only, system-defined views that provide metadata information about your BigQuery objects. There are a large amount of different views,ย check the documentation for a comprehensive overview.
For our use case, we are particularly interested in theย SCHEMATAย view and theย TABLESย view:
We can use theย SCHEMATA viewย to list all datasets within a region, e.g., forย us-central1:
SELECT * FROM `region-us-central1`.INFORMATION_SCHEMA.SCHEMATA;

Or we can use theย TABLES viewย to list all tables within a region: e.g., forย us-central1:
SELECT * FROM `region-us-central1`.INFORMATION_SCHEMA.TABLES;

Pros:
- Easy, programmatic access using SQL right in BigQuery.
- Very detailed information; e.g.ย
TABLESย viewย includesยcreation_time,ยddl,ยTABLE_STORAGEย viewย has information about theยtotal_rows, as well as the size of the table (both physical and logical storage).
Cons:
- Information scoped to a region at bestย (even for theย organisation level view). Given the list ofย regionsย is already quite long and might dynamically change, scripting around it might not be ideal.
- Not free, although pricing should be manageable for most use cases.
Achieving full freedom through custom scripting
As they do, our technically savvy clients challenged us and came to us with the ask of having visibility within the whole organisation as well as retrieving significant detail about the tables across projects.
We ended up writing a little script that did just that, which you are able to find on ourย GitHubย as well. It involves setting a few environment variables which will provision a service-account on the organisation level and then loop through all the projects and datasets in this organisation, retrieving the tables within these datasets as well as the relevant details (in this case:ย table size). While some basic Python knowledge and familiarity with the BigQuery API is needed, this approach gives us the most flexibility (we can retrieve otherย table featuresย by checking the API docs), and data can be written in any format you would like for downstream consumption.
Pros:
- Maximum flexibility:
- Extract all the data you want from the API.
- Output results can be made in any way you want.
- Add filters or extend the script to your heartโs content. - Free, fromย DoiTย to you.
Cons:
- Additional time and effort to write the script โ but we did a lot of the legwork for you already!
Conclusion
Navigating BigQuery can be complex, with each method offering unique advantages tailored to different organisational needs. While standard tools provide simplicity and accessibility, custom scripting unlocks unparalleled flexibility and detail, albeit with some upfront effort.
Have you taken another approach in the past? Please share your preferred way, or extend the basic script to fit your needs!




