This article describes a quick recipe for listing BigQuery tables partition from its metadata (instead of querying the table directly).
Let’s assume that we have a big table partitioned by a date column and you’d like to get the list of all partitions available in this table. While we could just query it directy like this:
select distinct partition_column
from `project_name.dataset_name.big_table`
But such query scans whole column and that’s what BigQuery will bill us for.
The same result can be achieved with a cost of just 10 megabytes - we just need to use BigQuery’s INFORMATION_SCHEMA
views. They hold various metadata about your BQ objects and, in our particular case, we’ll focus on INFORMATION_SCHEMA.PARTITIONS
which contains partitions metadata.
select *
from `project_name.dataset_name.INFORMATION_SCHEMA.PARTITIONS`
Above query returns metadata about tables partitions within our dataset - stuff like partition name, total rows, total logical bytes, total billable bytes and timestamp of last modification.
PARTITIONS
is just a subset of information you can get from there and I recommend going through official documentation for more details.
In case of tables partitioned by date, the partition id is a string is in the format of %Y%m%d
, i.e. 20240101
. You might also see partition ids like
__NULL__
- this is the partition which will contains records with partition column NULL values__UNPARTITIONED__
- records that fell ouf of the partitiong range
Now, assuming that our table has no null values or unpartitioned records, we can get an equivalent of earlier direct querying of the table with:
select parse_date('%Y%m%d', partition_id)
from `project_name.dataset_name.INFORMATION_SCHEMA.PARTITIONS`
where table_name = 'big_table'
Note the capital letters in INFORMATION_SCHEMA.PARTITIONS
as it won’t work with small letters.
We got the same results with much smaller costs.
(13/52) This is a 13th post from my blogging challenge (publishing 52 posts in 2024).