Assuming dbt on Snowflake, this article organizes the settings that are easy to overlook in day-to-day ELT. Following the behavior described in the official docs, it condenses the points most often tested on the dbt Analytics Engineer exam into a practical reference.
The key points are: choosing the right materialization, designing an incremental update strategy, knowing when to use clustering, handling privileges and object lifetime, and controlling runs and monitoring cost.
On the Snowflake side, set the warehouse size, enable a short auto-suspend (60-300 seconds), enable auto-resume, and separate warehouses between development and production. Throughput-heavy jobs are more stable when run on multi-cluster or larger sizes and finished quickly.
On the dbt side, explicitly set role, warehouse, database, schema, threads, and query_tag in profiles.yml. query_tag is invaluable for query-history traceability and cost attribution.
Logical flow of a dbt run against Snowflake
Developer/CI
|
v
dbt CLI/Cloud --(Jinja compile)--> Compiled SQL
| |
| via dbt-snowflake adapter |
v v
Snowflake Auth ----> Virtual Warehouse ----> Query Execution
| |
v v
Database -> Schema -> Tables/Views (Materializations)
^
|
Query Tag/Grantsprofiles.yml example for a Snowflake connection (official basic keys only)
my_snowflake_profile:
target: prod
outputs:
prod:
type: snowflake
account: "<account>"
user: "<user>"
password: "<password>"
role: "TRANSFORMER"
database: "ANALYTICS"
warehouse: "WH_ETL_L"
schema: "DBT_PROD"
threads: 8
client_session_keep_alive: false
query_tag: "dbt {{ target.name }} {{ invocation_id }}"On Snowflake, most of the performance and cost outcome is decided by your materialization choice. Use table or incremental for models that are referenced often and expensive to compute; prefer view for lightweight, occasionally-used models. Resolve upstream temporary intermediates with ephemeral so they become CTEs at compile time.
The dbt materialized setting can be flipped per model. The Analytics Engineer exam likes to ask why you would choose a given materialization, framed in terms of reference frequency, update frequency, and reuse cost.
| Materialization | Characteristics / Cost | Typical Use Case |
|---|---|---|
| view | Zero storage; consumes credits on each query. Grants are placed on the view | Light reshaping of dimensions, exploratory use |
| table | Heavy on first build but fast to read. Updates fully rebuild the table | Heavy aggregations, base data for dashboards |
| incremental | Writes only the delta. Upsert via MERGE | Tens of millions of fact rows refreshed daily |
| ephemeral | Resolves dependencies as a CTE; nothing is persisted | Small intermediate transformations between steps |
Per-model materialization (at the top of the SQL file)
{{ config(materialized='view') }}
with src as (
select * from {{ source('app', 'users') }}
)
select * from srcOn Snowflake the default incremental_strategy is merge. Always set unique_key and narrow the load window with is_incremental(). A change-detection column such as updated_at makes the pipeline more robust.
INSERT OVERWRITE replaces the entire table; on Snowflake it is not a partition-level overwrite. Use it only when you really need to rewrite the whole table, and only after weighing the compute cost and lock impact.
Example of an incremental model (MERGE) with a time window
{{ config(
materialized='incremental',
unique_key='id',
incremental_strategy='merge',
on_schema_change='sync_all_columns'
) }}
with src as (
select *
from {{ source('app', 'events') }}
{% if is_incremental() %}
where updated_at >= dateadd('hour', -6, (select max(updated_at) from {{ this }}))
{% endif %}
)
select
id,
user_id,
event_type,
updated_at
from srcSnowflake optimizes storage with automatic micro-partitions, but on very large tables with skewed filter columns, specifying cluster by can cut scan volume. Reclustering cost and benefit are a trade-off.
Pick the clustering key to match the typical filter predicates of your queries (date ranges, tenant IDs, and so on). You can observe the effect via system$clustering_information. It is unnecessary for small tables or aggregations that always full-scan.
Example of declaring clustering and evaluating it
{{ config(
materialized='table',
cluster_by=['event_date', 'tenant_id']
) }}
select * from {{ ref('fct_events_clean') }}
-- Evaluation: run manually to inspect clustering quality
-- select system$clustering_information('ANALYTICS','DBT_PROD','FCT_EVENTS');Use copy_grants for production privilege inheritance. Existing grants survive a dbt replace (ownership is the one exception).
transient tables are useful for cost optimization. You give up Fail-safe in exchange for lower storage cost. For sensitive data, use secure views and minimize data exposure through indirection.
Privilege inheritance, short-lived objects, and secure views
-- Recreate a table while inheriting grants
{{ config(materialized='table', copy_grants=true, transient=true) }}
select * from {{ ref('dim_products_clean') }}
-- Secure view
{{ config(materialized='view', secure=true) }}
select col1, col2 from {{ ref('fct_orders_masked') }}Tune threads to match warehouse size and query weight. A DAG design that avoids running too much staging and fact work in parallel matters too. Apply query_tag consistently so you can break down cost by warehouse, role, and environment.
Session parameter controls help prevent runaway queries. Set STATEMENT_TIMEOUT_IN_SECONDS in a dbt on-run-start hook. Enforce cost ceilings via Snowflake Resource Monitors, and on the dbt side, keep the failure-retry granularity small.
Control timeouts and similar settings via project hooks
on-run-start:
- "alter session set STATEMENT_TIMEOUT_IN_SECONDS = 1800"
- "alter session set QUERY_TAG = 'dbt {{ target.name }} {{ invocation_id }}'"Analytics Engineer
問題 1
A 500-million-row fact table on Snowflake is updated daily. Old and new rows are uniquely identified by id, and updated_at is present. You want the smallest-cost update that preserves existing grants and minimizes scanned bytes. Which dbt configuration is the most appropriate?
正解: A
On Snowflake, MERGE is the standard incremental strategy, and a unique_key plus an is_incremental() load window keeps scanned bytes low. copy_grants=true inherits grants across the replace. A table full rebuild is too expensive, view recomputes every query, and Snowflake's INSERT OVERWRITE replaces the whole table rather than performing a partial update.
Is INSERT OVERWRITE on Snowflake well suited to incremental loads?
On Snowflake, INSERT OVERWRITE replaces the entire table rather than overwriting individual partitions. Consider it only when you have to regenerate the full result set. For upserts keyed by id or updated_at, the MERGE strategy is the right choice.
What happens if I change cluster_by after the fact?
After the change, Snowflake's automatic clustering gradually reorganizes the table. You can expect fewer scanned bytes on reads, but reclustering itself can consume credits. Verify the impact by checking query history and system$clustering_information.
Which privileges does copy_grants carry over?
It preserves privileges granted on the target object (SELECT and similar) when the object is recreated. OWNERSHIP is not included, so the owning role still has to be designed separately. It can also be combined with secure views.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
dbt Models: SQL-Defined Transformation Units (2026)
Model fundamentals — SELECT-based definitions, naming, refs,...
dbt Analytics Engineering Exam: Complete Guide (2026)
Pass the AE Certification — scope, weighting, sample questio...
dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)
Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...
dbt Project Structure: models/seeds/macros Layout (2026)
Recommended dbt project layout — models, seeds, macros, snap...
dbt_project.yml Explained: Every Config (2026)
Every dbt_project.yml setting that matters — paths, vars, ma...