dbt uses Jinja2 to templatize SQL and produces the final SQL at compile time. In other words, Jinja is a set of rules expanded before execution. Understanding it correctly keeps model definitions concise and reusable.
This article organizes the minimum Jinja syntax and the dbt-specific context (ref, source, var, env_var, config, macros) that you need for both the Analytics Engineer exam and day-to-day operations, along with the points that often go wrong.
dbt evaluates Jinja before running a model and produces the final SQL. Dependency resolution (ref/source) and macro expansion happen at this stage. Jinja itself does not run in the database; it is processed by dbt (locally or in the runtime environment).
Evaluation falls roughly into two phases: parse/compile time, and runtime macro invocation. run_query, which needs a database connection, can only be called safely at runtime. ref, source, var, env_var, and config are routinely used in model and macro templates.
| Phase | What you can do | Representative APIs/syntax | Caveats |
|---|---|---|---|
| Parse / Compile | Template expansion and dependency resolution | ref, source, var, env_var, config | No database queries allowed |
| Runtime (inside macros) | Auxiliary query execution | run_query | Limit its use; don't mix it into the model's main SQL generation |
| DB execution | Execution of the final SQL | Pure SQL | Jinja has already been expanded by this point |
How Jinja is evaluated and executed in dbt
Minimal example: ref and Jinja expansion
-- models/orders_enriched.sql
select
o.id,
o.customer_id,
c.segment
from {{ ref('stg_orders') }} as o
left join {{ ref('dim_customers') }} as c
on o.customer_id = c.customer_id;You only need to remember three Jinja delimiters. {{ }} outputs a value (expression/variable), {% %} is for control structures (if, for, set, etc.), and {# #} is for comments. In dbt these mix together inside SQL.
Filters transform values with pipe syntax. default, lower, upper, join, and replace come up most often. Pay attention to quoting when handling numbers and strings, and especially keep safety and readability in mind when generating identifiers such as column names.
| Delimiter | Use | Example |
|---|---|---|
| {{ ... }} | Output a value | {{ var('days', 7) }} |
| {% ... %} | Control structure | {% if target.name == 'prod' %} ... {% endif %} |
| {# ... #} | Comment | {# build only for daily partition #} |
Mini diagram: roles of the delimiters
Filter and set example
{% set cols = ['id', 'customer_id', 'order_ts'] %}
select {{ cols | map('lower') | join(', ') }}
from {{ ref('stg_orders') }};if is useful when you want to toggle columns or conditions based on environment or target. for, which iterates over arrays and maps, is great for generating select columns or CASE expressions in bulk. Keep templating within the bounds of readability and extract long logic into macros.
Watch out for comma separators and trailing commas inside loops. Either separate safely with the join filter, or control them using loop.last.
| Syntax | Use | What to watch for on the exam |
|---|---|---|
| if/elif/else | Branch on environment or flags | Switching via target/var |
| for | Bulk-generate columns or conditions | Avoid trailing commas with join or loop.last |
| set | Hold an intermediate value | Consider extracting into a macro before it gets complex |
How a column list is built with for
Example using for and loop.last
{% set dims = ['country', 'city', 'zip'] %}
select
customer_id,
{% for d in dims %}
{{ d }}{% if not loop.last %},{% endif %}
{% endfor %}
from {{ ref('dim_customers') }};ref wires up model-to-model dependencies while resolving the proper schema and identifier. source is for referencing external tables and must match the declaration in catalog.yml. var retrieves variables passed via dbt_project.yml or the command line, while env_var reads OS environment variables. config is used for per-model settings such as materialization and tags.
On the Analytics Engineer exam, the ref-vs-source distinction, the precedence of var/env_var, and the scope of config (model-level vs project-level) come up frequently.
| Function / syntax | Primary use | Where it can be used | Exam notes |
|---|---|---|---|
| ref('model') | Reference a model and resolve dependencies | Model SQL / macros | Automatically wires up inter-model dependencies |
| source('src','table') | Reference an external table | Model SQL / macros | Must match the definition in sources.yml |
| var('key', default) | Retrieve a variable | Model SQL / macros | Make it safe with a default for the undefined case |
| env_var('NAME') | Retrieve an environment variable | Model SQL / macros | Depends on the runtime environment |
| config(...) | Model configuration | Model SQL (at the top) / macros | In-model config applies only to that model |
How ref and source relate
Examples of ref, source, var, env_var, and config
-- models/fct_orders.sql
{{ config(materialized='table', tags=['finance']) }}
with src as (
select * from {{ source('raw', 'orders') }}
), stg as (
select * from {{ ref('stg_orders') }} where order_date >= {{ var('from_date', "'1970-01-01'") }}
)
select * from stg
where {{ 'true' if env_var('ALLOW_ALL', 'false') == 'true' else 'total_amount > 0' }};Extract recurring SQL fragments into macros under the macros directory and pass arguments from the caller. Macros are Jinja templates that are expanded at call time. To share them across multiple projects, package them and install via packages.yml.
Use run_query only when you need an auxiliary query from inside a macro. Making the main SQL generation of a model depend on a run_query result can produce surprising behavior with compilation, caching, and parallel execution, so avoid it as a design pattern.
| Target | Location / definition | Call / application |
|---|---|---|
| Macro | macros/*.sql | {{ my_macro(arg='x') }} |
| Package | Referenced via packages.yml | {{ package_name.macro_name(...) }} |
| Model config | config(materialized='...') | {{ config(...) }} or dbt_project.yml |
Minimal project structure
Macro definition and call
-- macros/utils.sql
{% macro coalesce_zero(expr) %}
coalesce({{ expr }}, 0)
{% endmacro %}
-- models/fct_orders.sql(一部)
select {{ coalesce_zero('total_amount') }} as total_amount
from {{ ref('stg_orders') }};Schema and table names hardcoded without ref/source break across environments and renames. Always go through ref/source so the adapter quotes identifiers properly. When generating identifiers or literals dynamically, watch the consistency of string quoting.
To validate Jinja logic with tests, combine dbt test with unit tests (when available). The fastest way to confirm whether a template expanded as intended is to inspect the generated SQL with dbt compile.
| Pitfall | Symptom | Fix |
|---|---|---|
| Hardcoded references | Breaks when switching environments | Use ref/source |
| Trailing commas | SQL syntax error | Control with join or loop.last |
| Unsafe dependency on run_query | Non-deterministic and slow | Move metadata to vars or source control |
Safe vs. unsafe references
良い: {{ ref('stg_orders') }} -> コンパイル時に解決
悪い: raw_schema.stg_orders -> 環境で破綻Limit run_query to auxiliary use inside macros
-- macros/get_max_order_date.sql
{% macro get_max_order_date() %}
{% set res = run_query("select max(order_date) as d from " ~ ref('stg_orders')) %}
{% if res is not none and res.columns and res.rows %}
{{ return(res.rows[0][0]) }}
{% else %}
{{ return("'1970-01-01'") }}
{% endif %}
{% endmacro %}
-- 悪手: モデル本体の where に直接 run_query 結果を依存させないAnalytics Engineer
問題 1
As an Analytics Engineer, you need to reference the right table name per environment while correctly wiring up dependencies between models. Which is the best approach?
正解: A
ref builds the dependency graph and resolves identifiers per adapter at the same time, and is robust to environment differences. Hardcoding schemas/tables or stitching things together with hand-rolled variables is inferior in portability and dependency resolution.
When should I use var versus env_var?
var refers to template variables managed in the dbt project or on the command line, while env_var reads values from the runtime (OS) environment. For reproducibility, manage values that affect application behavior with var, and pass credentials or secrets through environment variables (env_var).
Which takes precedence: config in the model or in dbt_project.yml?
The config defined inside the model wins for that specific model. Project-level settings are defaults, and per-model config overrides them.
What is the safe way to build a column list with a loop?
Use the join filter, or control the trailing comma with loop.last. For example: select {{ cols | join(', ') }}. This prevents SQL syntax errors.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
dbt Models: SQL-Defined Transformation Units (2026)
Model fundamentals — SELECT-based definitions, naming, refs,...
dbt Analytics Engineering Exam: Complete Guide (2026)
Pass the AE Certification — scope, weighting, sample questio...
dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)
Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...
dbt Project Structure: models/seeds/macros Layout (2026)
Recommended dbt project layout — models, seeds, macros, snap...
dbt_project.yml Explained: Every Config (2026)
Every dbt_project.yml setting that matters — paths, vars, ma...