MetricFlow starts from the measures, dimensions, and entities defined in semantic models, aggregates the requested metrics at the optimal granularity and join path, and emits the final SQL. You can analyze data without writing SQL directly through consistent definitions, but ambiguous model design or query specification can lead to unintended aggregations.
This article explains the semantic model building blocks that exams frequently test, along with MetricFlow's query generation logic from a practical standpoint. It also covers real-world pitfalls including definition placement and naming, explicit join paths, and handling of time granularity and the time spine.
A semantic model is a layer that expresses business meaning independently from downstream reports and applications. MetricFlow reads these definitions and plans queries by combining measures (what to aggregate), dimensions (slicing axes), and entities (join keys). The default time axis for aggregation is set by agg_time_dimension.
A common source of confusion both on the exam and in practice is the distinction between measures and metrics. A measure is an aggregation definition on a table (such as sum(amount)); a metric is a business KPI built on top of measures (including period, filter, ratio, and so on). Think of measures as reusable minimal units and metrics as decision-making units.
| Element | Where defined | Main attributes | Grain / Key |
|---|---|---|---|
| Semantic model | semantic_models: | model, entities, dimensions, measures | Based on row-level definition |
| Measure | measures: | name, agg, expr, filter | Grain of defining table |
| Dimension (time) | dimensions: type: time | time_granularity, validity | Depends on time_granularity |
| Dimension (categorical) | dimensions: type: categorical | allowed_values, etc. | Depends on entity grain |
| Entity | entities: | name, type(primary/foreign) | Join key |
| Metric | metrics: | type, type_params, filter, window | Aligned with requested group_by |
Minimal semantic model and metric definition example
semantic_models:
- name: orders
model: ref('stg_orders')
defaults:
agg_time_dimension: order_date
entities:
- name: order
type: primary
- name: customer
type: foreign
dimensions:
- name: order_date
type: time
type_params:
time_granularity: day
- name: status
type: categorical
measures:
- name: order_total
agg: sum
expr: amount
- name: order_count
agg: count
metrics:
- name: total_revenue
type: simple
type_params:
measure: order_total
filter: status = 'completed'MetricFlow builds a plan from the requested metrics, group_by, where, and time_range, then locates the semantic models it needs. It picks join paths based on entities, aligns time granularity, and fills gaps with a time spine if needed. Finally, it decides the aggregation order and emits SQL targeted at the warehouse.
The crucial points in this process are the order of granularity changes and the handling of non-additive measures. First aggregate measures at their lowest grain, then roll them up to match group_by. Ratios and distinct measures are vulnerable to double counting, so make the intent explicit at the measure definition or metric type level and avoid unnecessary re-aggregation.
MetricFlow's planning and SQL generation flow
User Request
|
v
[Parse metrics/group_by/where]
|
v
[Semantic graph resolve]
| \
| [Choose join path by entities]
v
[Align time grain]
|
v
[Optional time spine (fill gaps)]
|
v
[Compute measures at base grain]
|
v
[Roll up to requested group_by]
|
v
[Apply metric-level filters/order/limit]
|
v
[Emit warehouse-specific SQL]Request example and key parameters
# Conceptual request representation (actual invocation varies by environment)
metrics: [total_revenue]
group_by: [order_date__month, customer__country]
where: "status = 'completed'"
time_range: '2024-01-01 ~ 2024-03-31'
order: [order_date__month]
limit: 100A time dimension is interpreted via the combination of its defined time_granularity and defaults.agg_time_dimension. If the request's group_by specifies monthly grain, day-grain measures roll up to month. Specifying a period on the metric side restricts the aggregation window according to time_range.
When you want missing periods to appear as zero rows, MetricFlow references a time spine (a continuous baseline date series) to fill the gaps. For example, even if there are zero orders in February, the report can still show February as 0.
Time dimension and cumulative metric definition example
semantic_models:
- name: orders
model: ref('stg_orders')
defaults:
agg_time_dimension: order_date
dimensions:
- name: order_date
type: time
type_params:
time_granularity: day
measures:
- name: order_total
agg: sum
expr: amount
metrics:
- name: revenue_7d
type: cumulative
type_params:
measure: order_total
window: 7 days
filter: status = 'completed'MetricFlow picks the shortest join path that reaches the requested dimension, based on each entity's primary/foreign designation. Getting from orders to customers and then regions requires a chain like orders.customer → customers.customer → customers.region → regions.region. Even when multiple paths exist, designing entities so they can be traversed uniquely prevents incorrect joins.
Mixing pre-aggregated or summary tables with detail tables tends to introduce entity grain mismatches. Effective countermeasures include routing through a bridge table to uniquify many-to-many relationships, or defining derived semantic models to align grain.
Definition example when multiple join paths exist
semantic_models:
- name: customers
model: ref('dim_customers')
entities:
- name: customer
type: primary
- name: region
type: foreign
dimensions:
- name: country
type: categorical
- name: regions
model: ref('dim_regions')
entities:
- name: region
type: primary
dimensions:
- name: region_name
type: categorical
- name: orders
model: ref('fct_orders')
defaults:
agg_time_dimension: order_date
entities:
- name: order
type: primary
- name: customer
type: foreign
measures:
- name: order_total
agg: sum
expr: amount
# With this setup, group_by: [order_date__month, customers__country, regions__region_name]
# can be resolved via a unique customer → region pathMetricFlow applies the measure-definition filter before measurement, then applies metric-level filters at the appropriate aggregation stages. For instance, adding status='completed' to the order_total measure limits every downstream metric to completed orders. In contrast, a filter on the metric side applies only to that specific metric.
Ratio metrics require evaluating the numerator and denominator measures at their respective correct grains before dividing. Adding superfluous dimensions to group_by unintentionally decomposes the ratio, so on the exam, learn to spot the option that maintains "ratio grain alignment".
Example contrasting measure-level and metric-level filters
semantic_models:
- name: orders
model: ref('stg_orders')
measures:
- name: order_total_completed
agg: sum
expr: amount
filter: status = 'completed'
metrics:
- name: total_revenue_completed
type: simple
type_params:
measure: order_total_completed
- name: total_revenue_high_value
type: simple
type_params:
measure: order_total_completed
filter: amount >= 100The Analytics Engineer exam tests precise understanding of terminology and the ability to spot grain, join, and time alignment issues. The fastest path is to study with the same lens you use in production reviews. Bake the following checks into development and review and you will hesitate less on the exam too.
Generated SQL reading-points snippet
-- Overview: base aggregation → join → rollup → final shaping
with base as (
select
order_id,
customer_id,
order_date::date as order_date_day,
amount
from fct_orders
where status = 'completed'
),
measures as (
select
customer_id,
date_trunc('day', order_date_day) as d_day,
sum(amount) as order_total
from base
group by 1,2
),
rolled as (
select
date_trunc('month', d_day) as d_month,
sum(order_total) as total_revenue
from measures
group by 1
)
select * from rolled
order by d_month
limit 100;Analytics Engineer
問題 1
In semantic models where orders can reach customers and regions, you want to compute the metric repeat_purchase_rate as "customers who purchased 2+ times in the last 30 days / customers who purchased in the last 30 days", with group_by set to only order_date__month. Which premise best avoids join-path ambiguity and grain mismatch?
正解: A
It is essential that the numerator and denominator are evaluated at the same entity (customer) grain, that the orders → customers join path is unique, and that time grain is stabilized via agg_time_dimension. B introduces an unnecessary dimension and breaks the grain. C cannot correct grain with where. D loses empty periods and makes period comparisons unstable.
Where should I typically place semantic model YAML files?
Place them as .yml files under the dbt project's models directory, close to the corresponding models. Prioritize lineage clarity and reviewability by splitting folders by staging or mart units, and logically organize the semantic_models and metrics sections.
Join paths are ambiguous and resolution is unstable. What should I do?
Clarify primary and foreign entities and design intermediate (bridge) models so the path is unique. Do not join many-to-many directly; guarantee uniqueness of entity keys. If needed, define derived semantic models to align granularity so ratios and distinct counts can be calculated safely.
Can I inspect and debug the generated SQL?
Yes. MetricFlow ultimately emits warehouse-specific SQL. Inspect logs and compile output to verify each stage (base aggregation, joins, rollup), and confirm where clauses are applied at the intended position and group_by expands as expected. Granularity mismatches are easiest to find by checking CTE-boundary columns against group by columns.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
dbt Models: SQL-Defined Transformation Units (2026)
Model fundamentals — SELECT-based definitions, naming, refs,...
dbt Analytics Engineering Exam: Complete Guide (2026)
Pass the AE Certification — scope, weighting, sample questio...
dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)
Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...
dbt Project Structure: models/seeds/macros Layout (2026)
Recommended dbt project layout — models, seeds, macros, snap...
dbt_project.yml Explained: Every Config (2026)
Every dbt_project.yml setting that matters — paths, vars, ma...