This article is a domain-by-domain mini question set for dbt Analytics Engineer exam candidates. It covers six areas — modeling and the DAG, materializations, testing, snapshots, Jinja/macros and packages, and execution/deployment — letting you quickly check the most common decision points in each.
Questions assume the behavior described in the official documentation and focus on core features that are stable across versions. UI specifics and cloud-specific features can change in detail, so concentrate on core concepts such as commands and YAML configuration.
dbt's DAG is generated automatically from the dependencies declared via ref and source. ref expresses dependencies between dbt-managed models, while source declares and maps external tables. In analytics work, the standard pattern is to normalize per naming conventions in the stg layer and aggregate to business grain in the marts layer.
The exam tends to test DAG consistency, naming conventions, dependency direction (upstream → downstream), and understanding of downstream visibility via exposures. Stick to fundamentals such as not building ref dynamically through string concatenation, and keeping source calls aligned with the YAML source definitions.
A typical dbt DAG (sources → staging → marts)
A basic example of ref and source
with src as (
select * from {{ source('raw', 'orders') }}
)
select
order_id,
cast(order_ts as timestamp) as order_ts,
customer_id
from srcMaterializations decide the trade-off between performance and cost. view is low-cost with immediate reflection, table provides a stable reference, incremental keeps the update load on large tables in check, and ephemeral leaves room for further optimization as a temporary subquery.
Many exam questions ask you to pick the right materialization based on a mix of data volume, update frequency, and recompute cost. In particular, understanding unique_key and on_schema_change for incremental models is a reliable scoring opportunity.
| Materialization | Persistence / storage | Best-fit use cases | Update cost |
|---|---|---|---|
| view | No persisted table (metadata only) | Small data, exploration that needs immediate reflection | Low (query executes each time) |
| table | Physical table | Stable delivery, fast dashboard responses | Medium to high (full recompute) |
| incremental | Physical table (delta applied) | Partial updates on large tables | Low to medium (deltas only) |
| ephemeral | Non-persistent (expanded as a subquery) | Reused, small intermediate computations | N/A (delegated upstream) |
A basic incremental model pattern
{{ config(
materialized='incremental',
unique_key='order_id',
on_schema_change='append_new_columns'
) }}
with src as (
select * from {{ ref('stg_orders') }}
{% if is_incremental() %}
where updated_at > (select coalesce(max(updated_at), '1900-01-01') from {{ this }})
{% endif %}
)
select * from srcdbt tests are broadly split into generic (not_null, unique, relationships, etc.) and custom (SQL-based) tests. Generic tests declared in schema.yml are easy to maintain and make it immediately clear which column and which rule failed when something breaks.
source freshness is effective for detecting upstream delays. Configure thresholds for error_after and warn_after in minutes/hours, then evaluate with dbt source freshness.
Generic tests and source freshness in schema.yml
version: 2
sources:
- name: raw
tables:
- name: orders
freshness:
warn_after: {count: 60, period: minute}
error_after: {count: 120, period: minute}
models:
- name: stg_orders
columns:
- name: order_id
tests:
- not_null
- unique
- name: customer_id
tests:
- relationships:
to: ref('dim_customer')
field: customer_id
- name: status
tests:
- accepted_values:
values: ['placed','shipped','delivered','returned']Snapshots are used for history management. The check strategy detects value changes in specified columns, while the timestamp strategy detects forward movement of an update-time column. Choose the business key as the primary key and a reliable timestamp column for updated_at.
The exam asks about late-arriving dimensions and reprocessing. The key decision is whether to use snapshots when the source has no history, or whether a simple incremental model suffices when it does.
An example snapshot for a customer dimension
{% snapshot customer_scd %}
{{ config(
target_schema='snapshots',
strategy='check',
unique_key='customer_id',
check_cols=['name','email','status']
) }}
select * from {{ source('raw','customers') }}
{% endsnapshot %}Encapsulate shared logic in macros and reuse them. Varying behavior through arguments helps suppress duplicated models. When environments differ, adapter.dispatch is also useful for switching adapter-dependent implementations.
Declare packages in packages.yml. Use semantic-versioning range specifiers (e.g., >=, <) to manage updates while preserving compatibility.
A simple macro, its call site, and packages.yml
-- macros/safe_divide.sql
{% macro safe_divide(numerator, denominator, default=0) %}
case when {{ denominator }} = 0 or {{ denominator }} is null then {{ default }}
else {{ numerator }} / {{ denominator }} end
{% endmacro %}
-- models/fct_orders_rate.sql
select
date_trunc('day', order_ts) as day,
{{ safe_divide('sum(revenue)','nullif(count(*),0)', 0) }} as avg_rev_per_order
from {{ ref('stg_orders') }}
group by 1
# packages.yml
packages:
- package: dbt-labs/codegen
version: ">=0.12.0,<0.13.0"dbt build comprehensively executes models, seeds, and snapshots along with their tests. Use dbt run for execution alone and dbt test for validation only. Narrow the scope with selectors (--select, --exclude), and leverage tags, paths, and state comparisons.
When you want to reference production datasets and resolve downstream nodes in staging, combine --defer with --state. state:modified+parents is effective for differential development.
selectors.yml and CLI examples
# selectors.yml
selectors:
- name: ci_changed
definition:
method: state
value: modified+parents
- name: marts_only
definition: tag:marts
# 直近変更と親をビルド(既存本番に defer)
# 事前に state を指す manifest.json を用意
# dbt build --selector ci_changed --defer --state path/to/artifacts
# タグ指定で実行
# dbt run --selector marts_onlyAnalytics Engineer
問題 1
A large orders table is updated hourly. New rows and updates are mixed, and order_id uniquely identifies each row. Dashboards require reflection within 5 minutes and stable response times. Which model definition is the most appropriate?
正解: A
The requirement is frequent updates on large data plus stable response. The most realistic choice is incremental with order_id as the unique_key and updated_at-based delta ingestion. view evaluates per query and is slow, table is expensive due to full recomputes every run, and ephemeral is not persisted, making it unsuitable for dashboard reference.
Should I learn dbt build or dbt run?
You should know both, but the exam tests whether you understand the use-case differences. build runs models (and seeds/snapshots) together with tests in one shot, which fits initial deployments and CI. run only executes models, making it suitable for targeted re-runs.
How do I choose between source freshness and generic tests?
source freshness detects upstream delays, while generic tests validate data quality on models. Use freshness for staleness, and generic tests (unique, not_null, etc.) for duplicates or nulls. The two roles are complementary.
What is the recommended approach when an incremental model's schema changes?
Pick the on_schema_change setting that matches your requirements. Use append_new_columns to allow only new columns, or fail for strict alignment, and combine it with a full-refresh procedure as part of your operations policy.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
dbt Models: SQL-Defined Transformation Units (2026)
Model fundamentals — SELECT-based definitions, naming, refs,...
dbt Analytics Engineering Exam: Complete Guide (2026)
Pass the AE Certification — scope, weighting, sample questio...
dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)
Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...
dbt Project Structure: models/seeds/macros Layout (2026)
Recommended dbt project layout — models, seeds, macros, snap...
dbt_project.yml Explained: Every Config (2026)
Every dbt_project.yml setting that matters — paths, vars, ma...