dbt_utils Package: Essential Utility Macros (2026)

dbt_utils is one of the packages you'll reach for most often in day-to-day dbt work. It runs reliably across warehouses and keeps model code short and readable.

This article focuses on the macros that come up most often in both exams and real work, covering their purpose, common pitfalls, and minimal snippets. It sticks to the stable, generally recommended usage from the official documentation.

Quick Reference: Most Used Macros

Start with the big picture. Exam questions often ask which macro you should choose, so it helps to view them through the lens of use cases.

The macros listed here are widely used on the major DWHs (Snowflake, BigQuery, Redshift, Databricks/Spark) and have stable APIs.

generate_surrogate_key: Composite key hashing with NULL safety
star: A safer SELECT * (with except, prefix, suffix)
union_relations: UNION with column mismatch normalization
date_spine: Generate a continuous date/time series
safe_divide: Safe division that avoids divide-by-zero
unique_combination_of_columns (test): Verify composite uniqueness

Macro	Typical use	Production tip	Exam angle
generate_surrogate_key	Generate a unique key from multiple columns	Normalize trimming and NULLs to produce a stable hash	Can you justify it via cross-warehouse consistency and NULL safety?
star	Expand SELECT columns (with except, prefix, suffix)	Reduces column maintenance burden when schemas change	Can you describe it as the safer alternative to a wildcard?
union_relations	UNION across tables with schema differences	Fills missing columns with NULL and aligns order to prevent failures	Can you identify it as the best fit for unifying different column sets?
date_spine	Generate continuous date/time rows	Watch out for off-by-one errors at the start/end boundaries	Can you pick the right way to build a calendar table?
safe_divide	Aggregations that avoid divide-by-zero	Handles NULL/0 safely to prevent NaN in dashboards	Can you explain how to build robust derived metrics?
unique_combination_of_columns (test)	Verify composite uniqueness	Directly ensures consistency of business keys	Can you judge what grain should be guaranteed unique?

Add to packages.yml (prerequisite)

packages:
  - package: dbt-labs/dbt_utils
    version: "+"  # Stable version range (match your project's compatibility)

dbt_utils' Place in Model Design

dbt_utils shortens repetitive modeling work and helps you write SQL that holds up against schema changes and DWH differences. Its role is especially clear across the Staging, Intermediate, and Mart layers.

Exams often ask which macro improves maintainability at which layer.

Staging: Use star to expand raw source columns and generate_surrogate_key for stable keys
Intermediate: Use union_relations to normalize schemas and date_spine to fill series gaps
Mart: Use safe_divide for robust ratio metrics and unique_combination_of_columns for testing

dbt model layers mapped to common macros

Basic form: calling a macro with ref and Jinja

select
  {{ dbt_utils.star(from=ref('stg_orders'), except=['_ingested_at']) }}
from {{ ref('stg_orders') }}

Build Stable Composite Keys with generate_surrogate_key

Surrogate keys are essential for joining across columns and identifying dimensions. generate_surrogate_key absorbs trimming and NULLs to return a consistent hash string.

Since the key changes when the base columns change, it's safer to lock in your usage alongside your SCD design for dimensions.

Normalize NULLs, blanks, and surrounding whitespace before hashing
Absorbs function differences across DWHs (dispatched by the package)
Length is implementation-dependent, but allowing VARCHAR(32-64) is generally safe

Typical snippet (surrogate key from a dimension's natural key)

with src as (
  select * from {{ ref('stg_customers') }}
)
select
  {{ dbt_utils.generate_surrogate_key([
    'coalesce(country_code, \"\")',
    'customer_id'
  ]) }} as customer_sk,
  *
from src

Absorb Schema Differences with star and union_relations

Raw source columns get added, removed, and reordered over time. star expands columns safely with except, prefix, and suffix. And when you need to combine multiple tables, union_relations automatically reconciles column mismatches.

Exams test your understanding of the risks of SELECT *, how star enables controlled column expansion, and how to normalize before a UNION.

star: Exclude audit columns like except=['raw_load_ts']
star: Avoid collisions with prefix/suffix (e.g., prefix='src_')
union_relations: Fills missing columns with NULL and projects extra columns in aligned order

UNION across multiple sources (automatic column reconciliation)

with a as (
  select {{ dbt_utils.star(from=ref('stg_orders_us'), except=['_ingested_at']) }} from {{ ref('stg_orders_us') }}
), b as (
  select {{ dbt_utils.star(from=ref('stg_orders_eu'), except=['_ingested_at']) }} from {{ ref('stg_orders_eu') }}
)
select * from {{ dbt_utils.union_relations(relations=[ref('stg_orders_us'), ref('stg_orders_eu')]) }}

Fill Missing Dates with date_spine

When you want date rows in your report even on days with no events, generate a continuous series with date_spine and left-join. This makes gaps on active days visible.

Be explicit about the start/end boundaries and the grain to avoid off-by-one errors. The end date is typically today or the start of tomorrow.

Specify datepart as 'day', 'hour', etc.
Pass start_date / end_date as strings containing expressions the DWH can parse
LEFT JOIN with your aggregation target to zero-fill missing values

Join a daily spine with sales (zero-fill)

with spine as (
  select * from {{ dbt_utils.date_spine(
    datepart='day',
    start_date="cast('2021-01-01' as date)",
    end_date="current_date"
  ) }}
), sales as (
  select order_date::date as d, sum(amount) as revenue
  from {{ ref('fct_orders') }}
  group by 1
)
select s.date_day as d,
       coalesce(sa.revenue, 0) as revenue
from spine s
left join sales sa on sa.d = s.date_day

Test Macros and Safe Derived Metrics (unique_combination_of_columns / safe_divide)

Guaranteeing model quality matters on both exams and the job. Use dbt_utils test macros to verify business-key uniqueness, and use safe_divide in aggregations to avoid divide-by-zero.

Especially in Looker or BI tools, avoiding divide-by-zero errors and NaN is fundamental to running stable dashboards.

unique_combination_of_columns: Declaratively guarantee composite key uniqueness
safe_divide: Implementation that safely returns NULL or 0 when the numerator or denominator is 0/NULL
Run quality tests automatically in CI to catch regressions early

Example of YAML tests and safe division

# models/schema.yml
version: 2
models:
  - name: dim_customer
    tests:
      - dbt_utils.unique_combination_of_columns:
          combination_of_columns: [customer_natural_key, effective_from]

-- models/mart/fct_kpi.sql
select
  date_day,
  {{ dbt_utils.safe_divide(sum(conversions), nullif(sum(sessions), 0)) }} as cr
from {{ ref('fct_sessions') }}
group by 1

Check Your Understanding

Analytics Engineer

問題 1

You want to combine customer_id and country_code from multiple sources into a surrogate key that's consistent across DWHs and resilient to NULLs and leading/trailing whitespace. Which dbt_utils macro should you use?

A. generate_surrogate_key
B. star
C. union_relations
D. safe_divide

正解: A

generate_surrogate_key produces a stable hash key from multiple columns and absorbs NULL/whitespace variation. star expands columns, union_relations normalizes UNIONs, and safe_divide is for numeric operations — none of these fit the goal.

Frequently Asked Questions

Do dbt_utils macros behave differently across data warehouses?

Major macros absorb DWH differences via adapter dispatch. For example, generate_surrogate_key handles per-engine function differences internally. However, places where you pass SQL strings (like start_date/end_date for date_spine) need to be expressions that each DWH can parse.

How is star different from SELECT *?

star lets you specify except, prefix, and suffix, giving you control over column order and naming conflicts. It reduces the risk of queries breaking when unexpected new columns appear, which can happen with SELECT *.

When should I write the unique_combination_of_columns test?

In practice, the safest approach is to write it at the same time you define a dimension's natural key or a fact's business-level unique grain. Running it on every CI build catches duplicates immediately and prevents downstream aggregation issues.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる

Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.

dbt_utils Package: Most Common Macros and When to Use Them