In dbt's Semantic Layer, you define measures (aggregation targets) inside a semantic_model so that downstream metrics and BI tools see consistent aggregations.
This article focuses on stable concepts from the official documentation (name, agg, expr, the common aggregation types) and on design principles that are largely unaffected by version drift.
A measure is an aggregation rule defined over a specific column (or expression). Against the rows of the base model referenced by a semantic_model, it defines aggregations such as sum, avg, min, max, count, and count_distinct. Metrics often combine these measures, but the measure itself is the minimal unit that captures "what is aggregated, and how".
Measures are defined in YAML and typically specify name (a unique identifier), agg (aggregation type), and expr (the target column or expression). For count, you can omit expr to count rows; for sum and avg, you name the column explicitly via expr. These have been stable concepts in the official dbt spec for a long time.
A common failure is placing measures on a base model with unstable grain. Define measures on a model where 1 record = 1 fact (e.g. one order, one line item, one daily snapshot) to prevent double counting from fan-out joins.
NULL and duplicate handling come up often on the exam and in practice. SUM ignores NULLs; AVG denominators exclude NULLs as well. COUNT(*) and COUNT(col) differ in meaning - COUNT(col) ignores NULLs. COUNT DISTINCT is non-additive, so rolling up to a higher grain can produce values that do not match.
Conceptual flow of how measures are consumed
Additivity and NULL handling for each aggregation type are the most important points both on the exam and in practice. Use the table below to lock in the behavior and pick the right measure for each use case.
Averages and deduplicated counts (COUNT DISTINCT) in particular are non-additive, and tend to produce counterintuitive results when rolled up to a higher grain or combined with additional dimensions.
| Aggregation type | Additivity (time / dimension) | NULL handling | Typical pitfall |
|---|---|---|---|
| sum | Additive (safe across both time and category) | NULLs are ignored (excluded from the aggregate in most SQL dialects) | Floating-point rounding errors; currency requires consistent scale and units |
| count | Additive (only on non-overlapping sets) | COUNT(*) counts all rows; COUNT(col) excludes NULLs | Joins that fan out inflate the count; mistakenly using COUNT(*) when COUNT(col) was intended |
| count_distinct | Non-additive (set union deduplicates overlap) | NULLs are excluded from the count in most dialects | Summing daily uniques does not match the monthly unique count |
| avg | Non-additive (the value changes on re-aggregation) | NULLs are excluded from the denominator (AVG(col)) | Re-averaging at a higher grain introduces error; consider computing from numerator/denominator measures |
| min/max | Non-additive (selection functions) | NULLs are usually ignored | Comparisons across multiple sets do not reflect the underlying distribution |
A measure is directly tied to "the grain at which it can be summed". The base model the semantic_model points to should have clear entities (e.g. order_id) and a time dimension (e.g. order_date at day grain), and ideally 1 row equals 1 event or 1 snapshot.
Set primary and foreign keys on entities appropriately, and assign time or categorical types to dimensions. Doing so makes it clear at what unit a non-additive measure like count_distinct should be interpreted.
Below is an example that defines typical measures on a semantic_model referencing an orders fact table. It shows three kinds: sum, count, and count_distinct. For non-additive aggregations like avg, it is safer to keep the numerator (sum) and denominator (count) as separate measures.
Definitions at this level are based on the stable spec, so they are largely unaffected by dbt version drift and are heavily tested on the exam.
Defining measures inside a semantic_model (YAML)
semantic_models:
- name: orders
model: ref('fct_orders')
entities:
- name: order_id
type: primary
- name: customer_id
type: foreign
dimensions:
- name: order_date
type: time
type_params:
time_granularity: day
- name: status
type: categorical
measures:
- name: revenue
agg: sum
expr: total_amount
- name: order_count
agg: count
- name: customer_count
agg: count_distinct
expr: customer_id
Measure quality depends on the quality of the underlying base model. Use dbt tests to verify primary key uniqueness, foreign key referential integrity, the type and scale of monetary columns, and gaps in the time dimension. In addition, use lightweight validation SQL to reconcile expected values for a sample period and check for join-driven inflation.
For the exam, you should be able to explain additivity, NULL handling, the difference between COUNT and COUNT DISTINCT, why averages cannot be re-aggregated, and the grain requirement for measures (1 row = 1 fact).
Analytics Engineer
問題 1
You want to correctly define monthly total revenue and unique purchasers in the dbt Semantic Layer. Which combination of measure definitions is most appropriate?
正解: A
Total revenue should use sum with the amount column named explicitly, and unique purchasers should use count_distinct with the target ID column named explicitly. avg is an average and is unsuitable for a total; count counts rows and does not give a unique customer count.
What does count measure with no expr actually count?
In most implementations it counts rows. If you want to exclude NULLs from a specific column (equivalent to COUNT(col)), name the column explicitly via expr.
Should averages and ratios be defined directly as a measure?
To avoid distortion when re-aggregating, it is safer to define the numerator (e.g. sum_revenue) and denominator (e.g. order_count) as separate measures and compute the ratio on the consumer side.
Why is count_distinct non-additive?
Because it deduplicates the set, simply summing values computed over different periods or dimensions does not account for overlap. As a result, rolling up to a higher grain often does not match the original value.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
dbt Models: SQL-Defined Transformation Units (2026)
Model fundamentals — SELECT-based definitions, naming, refs,...
dbt Analytics Engineering Exam: Complete Guide (2026)
Pass the AE Certification — scope, weighting, sample questio...
dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)
Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...
dbt Project Structure: models/seeds/macros Layout (2026)
Recommended dbt project layout — models, seeds, macros, snap...
dbt_project.yml Explained: Every Config (2026)
Every dbt_project.yml setting that matters — paths, vars, ma...