dbt

dbt Exam Sample Questions: Domain-by-Domain Practice Set (Analytics Engineer)

2026-04-19
NicheeLab Editorial Team

This article is a domain-by-domain mini question set for dbt Analytics Engineer exam candidates. It covers six areas — modeling and the DAG, materializations, testing, snapshots, Jinja/macros and packages, and execution/deployment — letting you quickly check the most common decision points in each.

Questions assume the behavior described in the official documentation and focus on core features that are stable across versions. UI specifics and cloud-specific features can change in detail, so concentrate on core concepts such as commands and YAML configuration.

Domain 1: Modeling and the DAG (using ref and source correctly)

dbt's DAG is generated automatically from the dependencies declared via ref and source. ref expresses dependencies between dbt-managed models, while source declares and maps external tables. In analytics work, the standard pattern is to normalize per naming conventions in the stg layer and aggregate to business grain in the marts layer.

The exam tends to test DAG consistency, naming conventions, dependency direction (upstream → downstream), and understanding of downstream visibility via exposures. Stick to fundamentals such as not building ref dynamically through string concatenation, and keeping source calls aligned with the YAML source definitions.

  • ref is resolved at compile time — delegate schema/database names to the adapter rather than hardcoding them.
  • Declare sources under sources:, and combine them with source freshness to detect upstream delays.
  • exposures put downstream assets such as BI dashboards onto the DAG and document owners and maturity.
  • Layered design fundamentals: sources → staging → intermediate (optional) → marts (dim/fact).
  • Common exam pitfalls: do not create circular dependencies with ref, and avoid naming inconsistencies across model layers.

A typical dbt DAG (sources → staging → marts)

external sourceraw.ordersstg_ordersref('stg_orders')fct_orders_dailyref('fct_orders_daily')BI dashboards / exposures

A basic example of ref and source

with src as (
  select * from {{ source('raw', 'orders') }}
)
select
  order_id,
  cast(order_ts as timestamp) as order_ts,
  customer_id
from src

Domain 2: Materializations and performance choices

Materializations decide the trade-off between performance and cost. view is low-cost with immediate reflection, table provides a stable reference, incremental keeps the update load on large tables in check, and ephemeral leaves room for further optimization as a temporary subquery.

Many exam questions ask you to pick the right materialization based on a mix of data volume, update frequency, and recompute cost. In particular, understanding unique_key and on_schema_change for incremental models is a reliable scoring opportunity.

  • Use view for small datasets with frequent updates, and table for stable delivery.
  • Incremental updates require unique_key. The update strategy depends on the adapter (e.g., MERGE).
  • The default for on_schema_change is ignore. Configure append_new_columns and similar values to match your requirements.
  • ephemeral is inlined at compile time, so watch out for huge, hard-to-manage queries.
MaterializationPersistence / storageBest-fit use casesUpdate cost
viewNo persisted table (metadata only)Small data, exploration that needs immediate reflectionLow (query executes each time)
tablePhysical tableStable delivery, fast dashboard responsesMedium to high (full recompute)
incrementalPhysical table (delta applied)Partial updates on large tablesLow to medium (deltas only)
ephemeralNon-persistent (expanded as a subquery)Reused, small intermediate computationsN/A (delegated upstream)

A basic incremental model pattern

{{ config(
  materialized='incremental',
  unique_key='order_id',
  on_schema_change='append_new_columns'
) }}

with src as (
  select * from {{ ref('stg_orders') }}
  {% if is_incremental() %}
    where updated_at > (select coalesce(max(updated_at), '1900-01-01') from {{ this }})
  {% endif %}
)
select * from src

Domain 3: Testing and validation (generic/custom, source freshness)

dbt tests are broadly split into generic (not_null, unique, relationships, etc.) and custom (SQL-based) tests. Generic tests declared in schema.yml are easy to maintain and make it immediately clear which column and which rule failed when something breaks.

source freshness is effective for detecting upstream delays. Configure thresholds for error_after and warn_after in minutes/hours, then evaluate with dbt source freshness.

  • Combine unique + not_null to guarantee the integrity of a natural key.
  • Use relationships to validate foreign-key integrity across models.
  • accepted_values is useful for managing surrogate codes or status columns.
  • Custom tests are written as SQL that returns the failing rows.

Generic tests and source freshness in schema.yml

version: 2

sources:
  - name: raw
    tables:
      - name: orders
        freshness:
          warn_after: {count: 60, period: minute}
          error_after: {count: 120, period: minute}

models:
  - name: stg_orders
    columns:
      - name: order_id
        tests:
          - not_null
          - unique
      - name: customer_id
        tests:
          - relationships:
              to: ref('dim_customer')
              field: customer_id
      - name: status
        tests:
          - accepted_values:
              values: ['placed','shipped','delivered','returned']

Domain 4: Snapshots (the basics of SCD management)

Snapshots are used for history management. The check strategy detects value changes in specified columns, while the timestamp strategy detects forward movement of an update-time column. Choose the business key as the primary key and a reliable timestamp column for updated_at.

The exam asks about late-arriving dimensions and reprocessing. The key decision is whether to use snapshots when the source has no history, or whether a simple incremental model suffices when it does.

  • strategy=check compares values — watch out for forgetting to list change-tracked columns.
  • strategy=timestamp assumes the update column increases monotonically.
  • Snapshots are persisted as tables, so you need a plan for managing and rebuilding them.

An example snapshot for a customer dimension

{% snapshot customer_scd %}
{{ config(
  target_schema='snapshots',
  strategy='check',
  unique_key='customer_id',
  check_cols=['name','email','status']
) }}

select * from {{ source('raw','customers') }}

{% endsnapshot %}

Domain 5: Key points on macros, Jinja, and packages

Encapsulate shared logic in macros and reuse them. Varying behavior through arguments helps suppress duplicated models. When environments differ, adapter.dispatch is also useful for switching adapter-dependent implementations.

Declare packages in packages.yml. Use semantic-versioning range specifiers (e.g., >=, <) to manage updates while preserving compatibility.

  • Inject values at compile time with set, default, and env_var.
  • Place macros in the macros/ directory and call them with {{ my_macro() }}.
  • Update packages with dbt deps and pin explicit version ranges.

A simple macro, its call site, and packages.yml

-- macros/safe_divide.sql
{% macro safe_divide(numerator, denominator, default=0) %}
  case when {{ denominator }} = 0 or {{ denominator }} is null then {{ default }}
       else {{ numerator }} / {{ denominator }} end
{% endmacro %}

-- models/fct_orders_rate.sql
select
  date_trunc('day', order_ts) as day,
  {{ safe_divide('sum(revenue)','nullif(count(*),0)', 0) }} as avg_rev_per_order
from {{ ref('stg_orders') }}
group by 1

# packages.yml
packages:
  - package: dbt-labs/codegen
    version: ">=0.12.0,<0.13.0"

Domain 6: Execution, selection, and deployment (run/build/test and selectors)

dbt build comprehensively executes models, seeds, and snapshots along with their tests. Use dbt run for execution alone and dbt test for validation only. Narrow the scope with selectors (--select, --exclude), and leverage tags, paths, and state comparisons.

When you want to reference production datasets and resolve downstream nodes in staging, combine --defer with --state. state:modified+parents is effective for differential development.

  • dbt build unifies execution ordering and test runs — ideal for initial deployments and CI.
  • Use state:modified+parents to re-run modified models together with their parents.
  • --selector can reference a logical name defined in selectors.yml.
  • --defer falls back to existing-environment artifacts for unresolvable references.

selectors.yml and CLI examples

# selectors.yml
selectors:
  - name: ci_changed
    definition:
      method: state
      value: modified+parents
  - name: marts_only
    definition: tag:marts

# 直近変更と親をビルド(既存本番に defer)
# 事前に state を指す manifest.json を用意
# dbt build --selector ci_changed --defer --state path/to/artifacts

# タグ指定で実行
# dbt run --selector marts_only

Check your understanding

Analytics Engineer

問題 1

A large orders table is updated hourly. New rows and updates are mixed, and order_id uniquely identifies each row. Dashboards require reflection within 5 minutes and stable response times. Which model definition is the most appropriate?

  1. Use an incremental materialization with order_id as the unique_key, ingest only the updated_at delta via is_incremental(), and plan periodic full refreshes as needed.
  2. Use a view materialization so the latest data is always referenced.
  3. Use a table materialization and fully rebuild every time.
  4. Define it as an ephemeral model and reference it from the dashboard.

正解: A

The requirement is frequent updates on large data plus stable response. The most realistic choice is incremental with order_id as the unique_key and updated_at-based delta ingestion. view evaluates per query and is slow, table is expensive due to full recomputes every run, and ephemeral is not persisted, making it unsuitable for dashboard reference.

Frequently Asked Questions

Should I learn dbt build or dbt run?

You should know both, but the exam tests whether you understand the use-case differences. build runs models (and seeds/snapshots) together with tests in one shot, which fits initial deployments and CI. run only executes models, making it suitable for targeted re-runs.

How do I choose between source freshness and generic tests?

source freshness detects upstream delays, while generic tests validate data quality on models. Use freshness for staleness, and generic tests (unique, not_null, etc.) for duplicates or nulls. The two roles are complementary.

What is the recommended approach when an incremental model's schema changes?

Pick the on_schema_change setting that matches your requirements. Use append_new_columns to allow only new columns, or fail for strict alignment, and combine it with a full-refresh procedure as part of your operations policy.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
dbt

dbt Models: SQL-Defined Transformation Units (2026)

Model fundamentals — SELECT-based definitions, naming, refs,...

dbt

dbt Analytics Engineering Exam: Complete Guide (2026)

Pass the AE Certification — scope, weighting, sample questio...

dbt

dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)

Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...

dbt

dbt Project Structure: models/seeds/macros Layout (2026)

Recommended dbt project layout — models, seeds, macros, snap...

dbt

dbt_project.yml Explained: Every Config (2026)

Every dbt_project.yml setting that matters — paths, vars, ma...

Browse all dbt articles (101)
© 2026 NicheeLab All rights reserved.