dbt_utils is one of the packages you'll reach for most often in day-to-day dbt work. It runs reliably across warehouses and keeps model code short and readable.
This article focuses on the macros that come up most often in both exams and real work, covering their purpose, common pitfalls, and minimal snippets. It sticks to the stable, generally recommended usage from the official documentation.
Start with the big picture. Exam questions often ask which macro you should choose, so it helps to view them through the lens of use cases.
The macros listed here are widely used on the major DWHs (Snowflake, BigQuery, Redshift, Databricks/Spark) and have stable APIs.
| Macro | Typical use | Production tip | Exam angle |
|---|---|---|---|
| generate_surrogate_key | Generate a unique key from multiple columns | Normalize trimming and NULLs to produce a stable hash | Can you justify it via cross-warehouse consistency and NULL safety? |
| star | Expand SELECT columns (with except, prefix, suffix) | Reduces column maintenance burden when schemas change | Can you describe it as the safer alternative to a wildcard? |
| union_relations | UNION across tables with schema differences | Fills missing columns with NULL and aligns order to prevent failures | Can you identify it as the best fit for unifying different column sets? |
| date_spine | Generate continuous date/time rows | Watch out for off-by-one errors at the start/end boundaries | Can you pick the right way to build a calendar table? |
| safe_divide | Aggregations that avoid divide-by-zero | Handles NULL/0 safely to prevent NaN in dashboards | Can you explain how to build robust derived metrics? |
| unique_combination_of_columns (test) | Verify composite uniqueness | Directly ensures consistency of business keys | Can you judge what grain should be guaranteed unique? |
Add to packages.yml (prerequisite)
packages:
- package: dbt-labs/dbt_utils
version: "+" # Stable version range (match your project's compatibility)dbt_utils shortens repetitive modeling work and helps you write SQL that holds up against schema changes and DWH differences. Its role is especially clear across the Staging, Intermediate, and Mart layers.
Exams often ask which macro improves maintainability at which layer.
dbt model layers mapped to common macros
Basic form: calling a macro with ref and Jinja
select
{{ dbt_utils.star(from=ref('stg_orders'), except=['_ingested_at']) }}
from {{ ref('stg_orders') }}Surrogate keys are essential for joining across columns and identifying dimensions. generate_surrogate_key absorbs trimming and NULLs to return a consistent hash string.
Since the key changes when the base columns change, it's safer to lock in your usage alongside your SCD design for dimensions.
Typical snippet (surrogate key from a dimension's natural key)
with src as (
select * from {{ ref('stg_customers') }}
)
select
{{ dbt_utils.generate_surrogate_key([
'coalesce(country_code, \"\")',
'customer_id'
]) }} as customer_sk,
*
from srcRaw source columns get added, removed, and reordered over time. star expands columns safely with except, prefix, and suffix. And when you need to combine multiple tables, union_relations automatically reconciles column mismatches.
Exams test your understanding of the risks of SELECT *, how star enables controlled column expansion, and how to normalize before a UNION.
UNION across multiple sources (automatic column reconciliation)
with a as (
select {{ dbt_utils.star(from=ref('stg_orders_us'), except=['_ingested_at']) }} from {{ ref('stg_orders_us') }}
), b as (
select {{ dbt_utils.star(from=ref('stg_orders_eu'), except=['_ingested_at']) }} from {{ ref('stg_orders_eu') }}
)
select * from {{ dbt_utils.union_relations(relations=[ref('stg_orders_us'), ref('stg_orders_eu')]) }}When you want date rows in your report even on days with no events, generate a continuous series with date_spine and left-join. This makes gaps on active days visible.
Be explicit about the start/end boundaries and the grain to avoid off-by-one errors. The end date is typically today or the start of tomorrow.
Join a daily spine with sales (zero-fill)
with spine as (
select * from {{ dbt_utils.date_spine(
datepart='day',
start_date="cast('2021-01-01' as date)",
end_date="current_date"
) }}
), sales as (
select order_date::date as d, sum(amount) as revenue
from {{ ref('fct_orders') }}
group by 1
)
select s.date_day as d,
coalesce(sa.revenue, 0) as revenue
from spine s
left join sales sa on sa.d = s.date_dayGuaranteeing model quality matters on both exams and the job. Use dbt_utils test macros to verify business-key uniqueness, and use safe_divide in aggregations to avoid divide-by-zero.
Especially in Looker or BI tools, avoiding divide-by-zero errors and NaN is fundamental to running stable dashboards.
Example of YAML tests and safe division
# models/schema.yml
version: 2
models:
- name: dim_customer
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns: [customer_natural_key, effective_from]
-- models/mart/fct_kpi.sql
select
date_day,
{{ dbt_utils.safe_divide(sum(conversions), nullif(sum(sessions), 0)) }} as cr
from {{ ref('fct_sessions') }}
group by 1Analytics Engineer
問題 1
You want to combine customer_id and country_code from multiple sources into a surrogate key that's consistent across DWHs and resilient to NULLs and leading/trailing whitespace. Which dbt_utils macro should you use?
正解: A
generate_surrogate_key produces a stable hash key from multiple columns and absorbs NULL/whitespace variation. star expands columns, union_relations normalizes UNIONs, and safe_divide is for numeric operations — none of these fit the goal.
Do dbt_utils macros behave differently across data warehouses?
Major macros absorb DWH differences via adapter dispatch. For example, generate_surrogate_key handles per-engine function differences internally. However, places where you pass SQL strings (like start_date/end_date for date_spine) need to be expressions that each DWH can parse.
How is star different from SELECT *?
star lets you specify except, prefix, and suffix, giving you control over column order and naming conflicts. It reduces the risk of queries breaking when unexpected new columns appear, which can happen with SELECT *.
When should I write the unique_combination_of_columns test?
In practice, the safest approach is to write it at the same time you define a dimension's natural key or a fact's business-level unique grain. Running it on every CI build catches duplicates immediately and prevents downstream aggregation issues.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
dbt Models: SQL-Defined Transformation Units (2026)
Model fundamentals — SELECT-based definitions, naming, refs,...
dbt Analytics Engineering Exam: Complete Guide (2026)
Pass the AE Certification — scope, weighting, sample questio...
dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)
Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...
dbt Project Structure: models/seeds/macros Layout (2026)
Recommended dbt project layout — models, seeds, macros, snap...
dbt_project.yml Explained: Every Config (2026)
Every dbt_project.yml setting that matters — paths, vars, ma...