dbt

dbt_utils Package: Most Common Macros and When to Use Them

2026-04-19
NicheeLab Editorial Team

dbt_utils is one of the packages you'll reach for most often in day-to-day dbt work. It runs reliably across warehouses and keeps model code short and readable.

This article focuses on the macros that come up most often in both exams and real work, covering their purpose, common pitfalls, and minimal snippets. It sticks to the stable, generally recommended usage from the official documentation.

Quick Reference: Most Used Macros

Start with the big picture. Exam questions often ask which macro you should choose, so it helps to view them through the lens of use cases.

The macros listed here are widely used on the major DWHs (Snowflake, BigQuery, Redshift, Databricks/Spark) and have stable APIs.

  • generate_surrogate_key: Composite key hashing with NULL safety
  • star: A safer SELECT * (with except, prefix, suffix)
  • union_relations: UNION with column mismatch normalization
  • date_spine: Generate a continuous date/time series
  • safe_divide: Safe division that avoids divide-by-zero
  • unique_combination_of_columns (test): Verify composite uniqueness
MacroTypical useProduction tipExam angle
generate_surrogate_keyGenerate a unique key from multiple columnsNormalize trimming and NULLs to produce a stable hashCan you justify it via cross-warehouse consistency and NULL safety?
starExpand SELECT columns (with except, prefix, suffix)Reduces column maintenance burden when schemas changeCan you describe it as the safer alternative to a wildcard?
union_relationsUNION across tables with schema differencesFills missing columns with NULL and aligns order to prevent failuresCan you identify it as the best fit for unifying different column sets?
date_spineGenerate continuous date/time rowsWatch out for off-by-one errors at the start/end boundariesCan you pick the right way to build a calendar table?
safe_divideAggregations that avoid divide-by-zeroHandles NULL/0 safely to prevent NaN in dashboardsCan you explain how to build robust derived metrics?
unique_combination_of_columns (test)Verify composite uniquenessDirectly ensures consistency of business keysCan you judge what grain should be guaranteed unique?

Add to packages.yml (prerequisite)

packages:
  - package: dbt-labs/dbt_utils
    version: "+"  # Stable version range (match your project's compatibility)

dbt_utils' Place in Model Design

dbt_utils shortens repetitive modeling work and helps you write SQL that holds up against schema changes and DWH differences. Its role is especially clear across the Staging, Intermediate, and Mart layers.

Exams often ask which macro improves maintainability at which layer.

  • Staging: Use star to expand raw source columns and generate_surrogate_key for stable keys
  • Intermediate: Use union_relations to normalize schemas and date_spine to fill series gaps
  • Mart: Use safe_divide for robust ratio metrics and unique_combination_of_columns for testing

dbt model layers mapped to common macros

Stagingstar / generate_surrogate_keyIntermediateunion_relations / date_spineMartsafe_divide / tests (dbt_utils)Sources / rawSchema normalizationAggregations / KPIsdbt model layers mapped to common macros

Basic form: calling a macro with ref and Jinja

select
  {{ dbt_utils.star(from=ref('stg_orders'), except=['_ingested_at']) }}
from {{ ref('stg_orders') }}

Build Stable Composite Keys with generate_surrogate_key

Surrogate keys are essential for joining across columns and identifying dimensions. generate_surrogate_key absorbs trimming and NULLs to return a consistent hash string.

Since the key changes when the base columns change, it's safer to lock in your usage alongside your SCD design for dimensions.

  • Normalize NULLs, blanks, and surrounding whitespace before hashing
  • Absorbs function differences across DWHs (dispatched by the package)
  • Length is implementation-dependent, but allowing VARCHAR(32-64) is generally safe

Typical snippet (surrogate key from a dimension's natural key)

with src as (
  select * from {{ ref('stg_customers') }}
)
select
  {{ dbt_utils.generate_surrogate_key([
    'coalesce(country_code, \"\")',
    'customer_id'
  ]) }} as customer_sk,
  *
from src

Absorb Schema Differences with star and union_relations

Raw source columns get added, removed, and reordered over time. star expands columns safely with except, prefix, and suffix. And when you need to combine multiple tables, union_relations automatically reconciles column mismatches.

Exams test your understanding of the risks of SELECT *, how star enables controlled column expansion, and how to normalize before a UNION.

  • star: Exclude audit columns like except=['raw_load_ts']
  • star: Avoid collisions with prefix/suffix (e.g., prefix='src_')
  • union_relations: Fills missing columns with NULL and projects extra columns in aligned order

UNION across multiple sources (automatic column reconciliation)

with a as (
  select {{ dbt_utils.star(from=ref('stg_orders_us'), except=['_ingested_at']) }} from {{ ref('stg_orders_us') }}
), b as (
  select {{ dbt_utils.star(from=ref('stg_orders_eu'), except=['_ingested_at']) }} from {{ ref('stg_orders_eu') }}
)
select * from {{ dbt_utils.union_relations(relations=[ref('stg_orders_us'), ref('stg_orders_eu')]) }}

Fill Missing Dates with date_spine

When you want date rows in your report even on days with no events, generate a continuous series with date_spine and left-join. This makes gaps on active days visible.

Be explicit about the start/end boundaries and the grain to avoid off-by-one errors. The end date is typically today or the start of tomorrow.

  • Specify datepart as 'day', 'hour', etc.
  • Pass start_date / end_date as strings containing expressions the DWH can parse
  • LEFT JOIN with your aggregation target to zero-fill missing values

Join a daily spine with sales (zero-fill)

with spine as (
  select * from {{ dbt_utils.date_spine(
    datepart='day',
    start_date="cast('2021-01-01' as date)",
    end_date="current_date"
  ) }}
), sales as (
  select order_date::date as d, sum(amount) as revenue
  from {{ ref('fct_orders') }}
  group by 1
)
select s.date_day as d,
       coalesce(sa.revenue, 0) as revenue
from spine s
left join sales sa on sa.d = s.date_day

Test Macros and Safe Derived Metrics (unique_combination_of_columns / safe_divide)

Guaranteeing model quality matters on both exams and the job. Use dbt_utils test macros to verify business-key uniqueness, and use safe_divide in aggregations to avoid divide-by-zero.

Especially in Looker or BI tools, avoiding divide-by-zero errors and NaN is fundamental to running stable dashboards.

  • unique_combination_of_columns: Declaratively guarantee composite key uniqueness
  • safe_divide: Implementation that safely returns NULL or 0 when the numerator or denominator is 0/NULL
  • Run quality tests automatically in CI to catch regressions early

Example of YAML tests and safe division

# models/schema.yml
version: 2
models:
  - name: dim_customer
    tests:
      - dbt_utils.unique_combination_of_columns:
          combination_of_columns: [customer_natural_key, effective_from]

-- models/mart/fct_kpi.sql
select
  date_day,
  {{ dbt_utils.safe_divide(sum(conversions), nullif(sum(sessions), 0)) }} as cr
from {{ ref('fct_sessions') }}
group by 1

Check Your Understanding

Analytics Engineer

問題 1

You want to combine customer_id and country_code from multiple sources into a surrogate key that's consistent across DWHs and resilient to NULLs and leading/trailing whitespace. Which dbt_utils macro should you use?

  1. A. generate_surrogate_key
  2. B. star
  3. C. union_relations
  4. D. safe_divide

正解: A

generate_surrogate_key produces a stable hash key from multiple columns and absorbs NULL/whitespace variation. star expands columns, union_relations normalizes UNIONs, and safe_divide is for numeric operations — none of these fit the goal.

Frequently Asked Questions

Do dbt_utils macros behave differently across data warehouses?

Major macros absorb DWH differences via adapter dispatch. For example, generate_surrogate_key handles per-engine function differences internally. However, places where you pass SQL strings (like start_date/end_date for date_spine) need to be expressions that each DWH can parse.

How is star different from SELECT *?

star lets you specify except, prefix, and suffix, giving you control over column order and naming conflicts. It reduces the risk of queries breaking when unexpected new columns appear, which can happen with SELECT *.

When should I write the unique_combination_of_columns test?

In practice, the safest approach is to write it at the same time you define a dimension's natural key or a fact's business-level unique grain. Running it on every CI build catches duplicates immediately and prevents downstream aggregation issues.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
dbt

dbt Models: SQL-Defined Transformation Units (2026)

Model fundamentals — SELECT-based definitions, naming, refs,...

dbt

dbt Analytics Engineering Exam: Complete Guide (2026)

Pass the AE Certification — scope, weighting, sample questio...

dbt

dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)

Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...

dbt

dbt Project Structure: models/seeds/macros Layout (2026)

Recommended dbt project layout — models, seeds, macros, snap...

dbt

dbt_project.yml Explained: Every Config (2026)

Every dbt_project.yml setting that matters — paths, vars, ma...

Browse all dbt articles (101)
© 2026 NicheeLab All rights reserved.