DBT: Ensure that environment variable is set during dbt run.
/ 3 min read
This post contains a solution to a problem I’ve come across recently: how to ensure that certain environment variable is set when dbt run
command is executed (but at the same time, it does not raise any error when you don’t set this variable during compilation phase, i.e. during CICD deployment).
Problem description
I was working on a dbt macro function used for anonymizing PII data before loading it into the data warehouse. I wanted to concatenate the original value with a secret key string (which should not be stored as a plain text anywhere in the dbt project).
I started with an initial implementation that would simply use DBT’s env_var
function. I quickly ran into a problem during CICD pipeline execution. The pipeline would compile the project to build the manifest (running command dbt compile
) and it would expect my environment variable to be set.
My initial ideas were:
-
Defining this environment variable as a Gitlab variable. It would not have to contain actual secret key value as you only need it when you materialize rows in data warehouse (when running
dbt run
). But it felt a little wrong and might have introduced some confusion (creating a Gitlab variable just for the sake its existence during deployment process). -
Using default value argument in
env_var
function. This one would fix the CICD pipeline problem but what if someone forgot to define this variable in the execution environment and data would be hashed without secret key value?
After giving it some thought, I decided to go with following approach.
Proposed solution
Final solution turned out to be a combination of env_var
default value + conditional logic for checking flags.WHICH
value.
{% macro hash_column(column) %}
{% set secret_value = env_var("DBT_HASHING_SECRET_KEY", "") %}
{% if flags.WHICH == "run" and salt == "" %}
{{ exceptions.raise_compiler_error("DBT_HASHING_SECRET_KEY environment variable must be set!") }}
{%- endif -%}
-- here goes actual implementation of SQL logic that hashes the column,
-- i.e. with SHA256 algorithm with combination of {{ secret_value }} string
{% endmacro %}
We set the secret key with result of env_var
function with an empty string as a default value.
Then we have a conditional logic for checking if the code execution context is due to invoking dbt run
(flags.WHICH == "run"
) + we check if salt is empty (which basically means that someone forgot to set the DBT_HASHING_SECRET_KEY
environment variable or set it to an empty string value) and raise a compilation error in such case.
Now, when you build the project during CICD pipeline, env_var
won’t raise any errors due to non-existing variable because we provided it with a default value.
If you’re interested about other information exposed in flags
variable, I encourage you to read the docs.
And that’s it :-)!
Happy data modeling!
(8/52) This is a 8th post from my blogging challenge (publishing 52 posts in 2024).