The dbt Databricks adapter leverages Delta Lake's ACID transactions, time travel, and MERGE to back the entire workflow from data modeling to production operations under a single developer experience. This article assumes the behavior documented in the official docs and focuses on the stable concepts that are most likely to appear on the exam.
We avoid version-specific details and walk through connection settings, materializations, incremental strategies, Unity Catalog permissions, and operational tuning in order, presenting criteria that work for both exam prep and the field.
dbt-databricks issues SQL against a Databricks SQL Warehouse or cluster and produces view, table, and incremental materializations on Delta Lake tables. Delta Lake's ACID transactions and MERGE are the foundation for reliable upserts.
From an Analytics Engineer perspective, the high-frequency topics are incremental strategies, the Unity Catalog three-level namespace, and aligning profile settings (host / http_path / token).
Conceptual execution flow from dbt to Delta Lake
Minimal install example (local development)
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\\Scripts\\activate
pip install --upgrade pip
pip install dbt-databricksTo connect to Databricks, define host, http_path, and token in profiles.yml. Using a SQL Warehouse gives you stable ANSI SQL execution and autoscaling. With Unity Catalog, set catalog explicitly. For the legacy Hive Metastore, set catalog to hive_metastore.
Both on the exam and in practice, the gotchas are that http_path differs between SQL Warehouse and cluster, and that catalog vs. schema need to be used distinctly.
Example profiles.yml (SQL Warehouse and cluster)
databricks_profile:
target: dev
outputs:
dev:
type: databricks
method: http
host: https://<your-workspace-host>
http_path: /sql/1.0/warehouses/<warehouse-id>
token: <databricks-personal-access-token>
catalog: main # use hive_metastore if not on Unity Catalog
schema: analytics_dev # recommend a per-user development schema
threads: 4
cluster_dev:
type: databricks
method: http
host: https://<your-workspace-host>
http_path: /sql/protocolv1/o/<org-id>/<cluster-id> # all-purpose cluster
token: <databricks-personal-access-token>
catalog: hive_metastore
schema: analytics_dev
threads: 4The Databricks adapter supports table, view, incremental, seed, and snapshot. To make Delta Lake explicit, specify file_format=delta. Delta is the default in some environments, but explicitly stating it is recommended for exam answers and long-term maintainability.
You can enable auto-optimize and compaction via table properties. Property names depend on workspace settings, so if you prioritize stability, calling OPTIMIZE from a post-hook is the safer option.
Basic Delta table model example
{{ config(
materialized='table',
file_format='delta',
tags=['core'],
grants={'select': ['analyst_group']}
) }}
with src as (
select * from {{ source('app', 'orders_raw') }}
)
select
order_id,
customer_id,
cast(order_ts as timestamp) as order_ts,
amount
from srcIncremental materializations cut down on recompute cost at scale. In the Databricks adapter, specifying unique_key selects MERGE and enables upserts. Without a unique_key, append is the safe default. When you partition, consider insert_overwrite.
To handle CDC late arrivals and re-ingestion, the standard approach is to branch the WHERE clause inside an is_incremental condition. Use an updated timestamp or an incremental ID as the watermark for stable behavior.
| Strategy | Update / Delete Support | Requirements | Typical Use Case |
|---|---|---|---|
| append | No update or delete (append only) | None (no key required) | Event logs, audit trails |
| merge | Handles updates (upsert); deletes require separate design | unique_key should be specified | Incremental updates for dimensions and facts |
| insert_overwrite | Replaces target partitions | partition_by must be defined | Optimized rebuilds of daily / hourly partitions |
Incremental model using MERGE (handles late arrivals)
{{ config(
materialized='incremental',
file_format='delta',
unique_key='order_id',
incremental_strategy='merge',
on_schema_change='append_new_columns' # auto-append new columns (stable behavior)
) }}
with staged as (
select
order_id,
customer_id,
cast(order_ts as timestamp) as order_ts,
amount,
ingestion_ts
from {{ ref('stg_orders') }}
{% if is_incremental() %}
where order_ts >= date_sub(current_timestamp(), 7) -- buffer for late-arriving data
{% endif %}
)
select * from stagedUnity Catalog manages objects with a three-level catalog.schema.table namespace. The dbt database concept maps to catalog on Databricks. Cleanly separate catalog / schema between development and production, and use target switching to prevent collisions.
You can grant permissions via the dbt grants config. CREATE SCHEMA in a pre-hook and statistics optimization in a post-hook is a safe pattern.
Hook example for permissions and schema creation
{{ config(
materialized='table',
file_format='delta',
grants={'select': ['analyst_group'], 'usage': ['bi_group']},
pre_hook=["create schema if not exists {{ target.catalog }}.{{ target.schema }}"],
post_hook=["analyze table {{ this }} compute statistics"]
) }}
select 1 as okKeeping Delta Lake performant requires file-size optimization and statistics updates. Run OPTIMIZE on large tables and ZORDER from a post-hook when access patterns are clear. Operate VACUUM carefully, aligned with retention windows and governance rules.
In job design, isolate metadata operations via run-operation and ramp up parallelism gradually, watching metastore load. Prioritize not_null, unique, and relationships tests, and use Delta time travel for rollback during incidents.
post-hook example for OPTIMIZE / ZORDER (apply carefully)
{{ config(
materialized='incremental',
file_format='delta',
unique_key='id',
incremental_strategy='merge',
post_hook=[
"optimize {{ this }} zorder by (customer_id, order_ts)",
"analyze table {{ this }} compute statistics"
]
) }}
select * from {{ ref('stg_fact') }}Analytics Engineer
問題 1
You want to build an incremental model in dbt that performs upserts on a Delta table on Databricks. Which configuration most appropriately reflects updates to the same record while minimizing implementation and operational cost?
正解: A
To take advantage of Delta Lake MERGE, the standard approach is to specify unique_key in dbt and use incremental_strategy=merge. Narrowing the re-extraction window with is_incremental to handle late arrivals is effective in practice as well. The other choices fall short on either update completeness or cost.
How do you use Delta time travel from dbt?
You can write SQL with SELECT ... VERSION AS OF or TIMESTAMP AS OF directly inside dbt models. Rollbacks rely on DDL, so keep them as a runbook outside of dbt and limit dbt itself to reads and validation for safety.
How do you specify Unity Catalog's three-level namespace in dbt?
In profiles.yml, catalog maps to the Unity Catalog catalog, schema maps to the schema, and the model name (identifier) becomes the table name. The this relation expands as catalog.schema.identifier in that order.
How should you prepare for schema evolution?
Set on_schema_change='append_new_columns' on incremental models so new columns can be appended to the table when they appear. Isolate large type changes into a separate migration and run dbt tests alongside them for safety.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
dbt Models: SQL-Defined Transformation Units (2026)
Model fundamentals — SELECT-based definitions, naming, refs,...
dbt Analytics Engineering Exam: Complete Guide (2026)
Pass the AE Certification — scope, weighting, sample questio...
dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)
Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...
dbt Project Structure: models/seeds/macros Layout (2026)
Recommended dbt project layout — models, seeds, macros, snap...
dbt_project.yml Explained: Every Config (2026)
Every dbt_project.yml setting that matters — paths, vars, ma...