dbt

dbt Semantic Layer Measures Guide: Designing Aggregations and Exam Tips

2026-04-19
NicheeLab Editorial Team

In dbt's Semantic Layer, you define measures (aggregation targets) inside a semantic_model so that downstream metrics and BI tools see consistent aggregations.

This article focuses on stable concepts from the official documentation (name, agg, expr, the common aggregation types) and on design principles that are largely unaffected by version drift.

Measure Basics: Role and Minimum Properties

A measure is an aggregation rule defined over a specific column (or expression). Against the rows of the base model referenced by a semantic_model, it defines aggregations such as sum, avg, min, max, count, and count_distinct. Metrics often combine these measures, but the measure itself is the minimal unit that captures "what is aggregated, and how".

Measures are defined in YAML and typically specify name (a unique identifier), agg (aggregation type), and expr (the target column or expression). For count, you can omit expr to count rows; for sum and avg, you name the column explicitly via expr. These have been stable concepts in the official dbt spec for a long time.

  • name: reused across models, so prioritize consistent naming (e.g. revenue, order_count)
  • agg: sum / avg / min / max / count / count_distinct
  • expr: the column or expression to aggregate (omitting it under count means row count)
  • Define measures alongside dimensions and entities (primary keys) to avoid grain collisions

Avoiding Aggregation Design Pitfalls

A common failure is placing measures on a base model with unstable grain. Define measures on a model where 1 record = 1 fact (e.g. one order, one line item, one daily snapshot) to prevent double counting from fan-out joins.

NULL and duplicate handling come up often on the exam and in practice. SUM ignores NULLs; AVG denominators exclude NULLs as well. COUNT(*) and COUNT(col) differ in meaning - COUNT(col) ignores NULLs. COUNT DISTINCT is non-additive, so rolling up to a higher grain can produce values that do not match.

  • Multi-table joins that break the measure's grain cause double counting
  • Watch out for COUNT vs COUNT DISTINCT confusion (especially for unique users)
  • Averages are non-additive: re-aggregating changes the value. Consider a weighted average if needed
  • When combining with time dimensions, verify missing dates and time zones

Conceptual flow of how measures are consumed

SourcesStaging ModelsFCT/Dim ModelsSemantic Modelentities / dimensions / measures (sum, count, ...)Metrics / QueriesBI, Notebooks, APIsSources → Staging → FCT/Dim → Semantic Model → Metrics / Queries

Comparison of the Main Aggregation Types

Additivity and NULL handling for each aggregation type are the most important points both on the exam and in practice. Use the table below to lock in the behavior and pick the right measure for each use case.

Averages and deduplicated counts (COUNT DISTINCT) in particular are non-additive, and tend to produce counterintuitive results when rolled up to a higher grain or combined with additional dimensions.

  • SUM ignores NULLs (in most SQL dialects SUM(NULL) is not NULL; the aggregate simply excludes NULL rows)
  • COUNT(*) is row count; COUNT(col) is the count excluding NULLs
  • COUNT DISTINCT is non-additive. Values do not line up across periods or dimensions
  • AVG changes value on re-aggregation. When needed, compute it from numerator (sum) and denominator (count) measures
Aggregation typeAdditivity (time / dimension)NULL handlingTypical pitfall
sumAdditive (safe across both time and category)NULLs are ignored (excluded from the aggregate in most SQL dialects)Floating-point rounding errors; currency requires consistent scale and units
countAdditive (only on non-overlapping sets)COUNT(*) counts all rows; COUNT(col) excludes NULLsJoins that fan out inflate the count; mistakenly using COUNT(*) when COUNT(col) was intended
count_distinctNon-additive (set union deduplicates overlap)NULLs are excluded from the count in most dialectsSumming daily uniques does not match the monthly unique count
avgNon-additive (the value changes on re-aggregation)NULLs are excluded from the denominator (AVG(col))Re-averaging at a higher grain introduces error; consider computing from numerator/denominator measures
min/maxNon-additive (selection functions)NULLs are usually ignoredComparisons across multiple sets do not reflect the underlying distribution

Grain and Entity Design vs. Measure Compatibility

A measure is directly tied to "the grain at which it can be summed". The base model the semantic_model points to should have clear entities (e.g. order_id) and a time dimension (e.g. order_date at day grain), and ideally 1 row equals 1 event or 1 snapshot.

Set primary and foreign keys on entities appropriately, and assign time or categorical types to dimensions. Doing so makes it clear at what unit a non-additive measure like count_distinct should be interpreted.

  • Define sum-style measures on a 1-row-per-fact model (e.g. revenue at line-item grain)
  • Use count_distinct for unique users, but make it explicit that re-aggregation across periods or segments is non-additive
  • Define the time dimension at the finest grain (such as day) and roll up to higher grains
  • Keep averages and ratios as separate numerator/denominator measures and compute on demand

Defining Measures in YAML: A Minimal Practical Example

Below is an example that defines typical measures on a semantic_model referencing an orders fact table. It shows three kinds: sum, count, and count_distinct. For non-additive aggregations like avg, it is safer to keep the numerator (sum) and denominator (count) as separate measures.

Definitions at this level are based on the stable spec, so they are largely unaffected by dbt version drift and are heavily tested on the exam.

  • For sum, name the target column in expr (numeric types recommended)
  • For count, expr can be omitted (row count)
  • For count_distinct, name the target column explicitly via expr
  • Set the finest grain for the time dimension (such as day) up front

Defining measures inside a semantic_model (YAML)

semantic_models:
  - name: orders
    model: ref('fct_orders')
    entities:
      - name: order_id
        type: primary
      - name: customer_id
        type: foreign
    dimensions:
      - name: order_date
        type: time
        type_params:
          time_granularity: day
      - name: status
        type: categorical
    measures:
      - name: revenue
        agg: sum
        expr: total_amount
      - name: order_count
        agg: count
      - name: customer_count
        agg: count_distinct
        expr: customer_id

Validation, Testing, and Exam Tips

Measure quality depends on the quality of the underlying base model. Use dbt tests to verify primary key uniqueness, foreign key referential integrity, the type and scale of monetary columns, and gaps in the time dimension. In addition, use lightweight validation SQL to reconcile expected values for a sample period and check for join-driven inflation.

For the exam, you should be able to explain additivity, NULL handling, the difference between COUNT and COUNT DISTINCT, why averages cannot be re-aggregated, and the grain requirement for measures (1 row = 1 fact).

  • Use dbt test's not_null/unique on primary keys and accepted_values on status columns
  • Standardize monetary columns to a numeric type and a single currency unit, and make the rounding point explicit in the model
  • Reconcile expected SUM and COUNT values on a small sample with hand calculations via validation SQL
  • Get into the habit of deriving averages and ratios from numerator/denominator measures

Check Your Understanding

Analytics Engineer

問題 1

You want to correctly define monthly total revenue and unique purchasers in the dbt Semantic Layer. Which combination of measure definitions is most appropriate?

  1. Define revenue with agg: sum and the amount column in expr, and purchasers with agg: count_distinct and customer_id in expr
  2. Define revenue with agg: avg and the amount column in expr, and purchasers with agg: count without expr
  3. Define revenue with agg: sum without expr, and purchasers with agg: count_distinct without expr
  4. Define revenue with agg: count to count rows, and purchasers with agg: avg and customer_id in expr

正解: A

Total revenue should use sum with the amount column named explicitly, and unique purchasers should use count_distinct with the target ID column named explicitly. avg is an average and is unsuitable for a total; count counts rows and does not give a unique customer count.

Frequently Asked Questions

What does count measure with no expr actually count?

In most implementations it counts rows. If you want to exclude NULLs from a specific column (equivalent to COUNT(col)), name the column explicitly via expr.

Should averages and ratios be defined directly as a measure?

To avoid distortion when re-aggregating, it is safer to define the numerator (e.g. sum_revenue) and denominator (e.g. order_count) as separate measures and compute the ratio on the consumer side.

Why is count_distinct non-additive?

Because it deduplicates the set, simply summing values computed over different periods or dimensions does not account for overlap. As a result, rolling up to a higher grain often does not match the original value.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
dbt

dbt Models: SQL-Defined Transformation Units (2026)

Model fundamentals — SELECT-based definitions, naming, refs,...

dbt

dbt Analytics Engineering Exam: Complete Guide (2026)

Pass the AE Certification — scope, weighting, sample questio...

dbt

dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)

Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...

dbt

dbt Project Structure: models/seeds/macros Layout (2026)

Recommended dbt project layout — models, seeds, macros, snap...

dbt

dbt_project.yml Explained: Every Config (2026)

Every dbt_project.yml setting that matters — paths, vars, ma...

Browse all dbt articles (101)
© 2026 NicheeLab All rights reserved.