dbt

Rich Great Expectations-style Validation with dbt-expectations

2026-04-19
NicheeLab Editorial Team

dbt lets you declare tests in YAML, then compiles them into adapter-specific SQL (Snowflake, BigQuery, Redshift, Spark, etc.) for execution (official: https://docs.getdbt.com/). dbt-expectations is a community package built on top of that mechanism, providing a rich set of validation macros similar to Great Expectations.

This article covers the angles the exam tends to probe — choosing, configuring, and interpreting tests — alongside ready-to-use YAML examples and operational patterns. Because the package's specific API may change, the explanations are grounded in dbt's stable concepts (tests, selectors, severity, store_failures, contracts, etc.).

Where dbt-expectations Fits and What the Exam Tends to Ask

dbt tests come in two flavors: generic tests (declared in schema.yml) and singular data tests (arbitrary SQL). dbt-expectations extends the former in a Great Expectations style, letting you invoke many validation macros concisely from YAML. The exam tends to ask whether you can choose between built-in tests (not_null, unique, relationships, accepted_values, etc.) and extended tests appropriately, how to use severity, and how to interpret results (error vs. warn, store_failures) (official: Tests basics https://docs.getdbt.com/docs/build/tests).

The key insight: tests compile to SQL and run against the target model or source. The declaration is declarative, but the underlying execution is set-based computation on the adapter. Evaluate test load, index usage, and date filters from the perspective of whether the resulting SQL is realistic.

  • Prefer generic tests; write raw SQL only as a last resort (for readability and reuse)
  • Use severity=error for critical quality checks, severity=warn for non-functional monitors
  • On large tables, control load with date partitioning or conditional clauses
  • In CI, use selectors to run only the critical tests in a short time window (official: Selection https://docs.getdbt.com/reference/node-selection/syntax)
Aspectdbt built-in testsdbt-expectationsGreat Expectations (reference)
How to defineConcise entries in schema.yml's tests blockExtended macros referenced from schema.ymlExpectations written in Python/YAML
StrengthsSolid coverage of basic constraintsDiverse statistical, distribution, and pattern checksExtremely expressive
Operational overheadMinimalSmall to moderate (package dependency)Moderate (separate runtime/orchestration)

Conceptual flow of dbt test compilation and execution

schema.yml testsdbt test macrosAdapter-specific SQL generationExecute on the DWHResults / failing rowsConceptual flow of dbt test compilation and execution

Add dbt-expectations to packages.yml

packages:
  - package: calogica/dbt-expectations
    version: ">=0.10.0,<1.0.0"  # Pin the version to match your project's compatibility

# Fetch dependencies after adding
# dbt deps

Common Validation Patterns and YAML Examples

With dbt-expectations, you can declare checks for nulls, duplicates, ranges, patterns, referential integrity, and distributions directly in YAML. Things the built-ins don't cover well — composite uniqueness, non-negative values within a range, regex patterns, and row-level conditional checks — become easy to express.

Below is an example for the orders model. From an exam perspective, you need to quickly read the syntax of the tests section, the severity setting, and whether where/row_condition filters are used.

  • Composite key uniqueness: unique_combination_of_columns
  • Value ranges / non-negative: expect_column_values_to_be_between, expect_column_values_to_be_increasing
  • Category coverage: expect_column_values_to_be_in_set / not_in_set
  • Regex patterns: expect_column_values_to_match_regex
  • Missing values / FK integrity: expect_column_values_to_not_be_null, extended relationships
CheckTypical useUnderlying SQL concept
unique_combination_of_columnsDetect natural-key duplicatesGROUP BY + HAVING COUNT(*)>1
expect_column_values_to_be_betweenThreshold checkBETWEEN / comparison operators
expect_column_values_to_match_regexFormat check for IDs / emailsREGEXP-family operators

Model → test macro → SQL (simplified)

orders (table)expectation macroSELECT ... FROM orders WHERE ...Model → test macro → SQL (simplified)

Example test definitions in schema.yml (dbt-expectations)

version: 2
models:
  - name: orders
    description: "Normalized order fact"
    columns:
      - name: order_id
        tests:
          - not_null
          - unique
      - name: customer_email
        tests:
          - dbt_expectations.expect_column_values_to_match_regex:
              regex: "^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}
NicheeLab を読み込み中…
quot; severity: warn - name: order_amount tests: - dbt_expectations.expect_column_values_to_be_between: min_value: 0 strictly: true severity: error tests: - dbt_expectations.unique_combination_of_columns: combination_of_columns: [order_id, line_number] severity: error - dbt_expectations.expect_table_row_count_to_be_between: min_value: 1 row_condition: "order_date >= dateadd(day, -7, current_date)" # Match your adapter's SQL dialect severity: warn

Execution and Results: Selectors, Severity, and Storing Failing Rows

dbt test supports selection syntax to narrow down what runs. Combine tags, model names, and paths to keep local runs light while making sure CI reliably executes the critical tests (official: Selection syntax). severity=warn does not mark the job as failed, but the result is still recorded. store_failures persists failing rows in a table so you can investigate root causes easily (official: Test configs https://docs.getdbt.com/docs/build/tests#test-configs).

In practice, split P0 quality checks (key uniqueness, not-null, referential integrity) at severity=error from health monitors (distributions, trends, minor format checks) at severity=warn, and make both visible in CI.

  • Local: dbt test -s orders+
  • CI critical: dbt test -s tag:p0
  • Comprehensive run: dbt test -m state:modified+ # only the diff
  • Store failing rows: tests: +store_failures: true (project- or resource-scoped)
Setting / executionEffectCaveats
severity: errorFailure fails the jobReserve for P0 checks
severity: warnJob continues even on failureVisibility is mandatory
store_failures: trueFailing rows become investigatableDecide schema cleanup and retention period

Selective test-execution pattern

localtag:p0 only (CI)all modified (PR)Selective test-execution pattern

Configure store_failures, tags, and selectors in dbt_project.yml

name: your_project
version: 1.0.0
config-version: 2

tests:
  +store_failures: true
  +severity: error  # Default, overridden per test

selectors:
  - name: critical_tests
    definition:
      method: fqn
      value: tag:p0

# Example execution
# dbt test -s @critical_tests

Adapter Differences and Performance Considerations

dbt tests ultimately become adapter-specific SQL. Regex, date functions, and approximate aggregations have dialect differences. Many dbt-expectations macros abstract those differences, but anywhere you pass raw SQL — row_condition, regex character classes, etc. — you need to match your environment's dialect.

On large tables, push partition or date conditions to the front and confirm the query shape lets indexes and clustering do their job.

  • Snowflake: watch function names like ILIKE/REGEXP and DATEADD/TO_DATE (official: https://docs.snowflake.com/)
  • BigQuery: watch REGEXP_CONTAINS/EXTRACT and certain type casts (official: https://docs.getdbt.com/docs/core/connect-data-platform/bigquery)
  • Spark/Databricks: verify regex and window-function compatibility (official: https://docs.databricks.com/)
  • row_condition and regex are dialect-dependent. Catch dialect-induced failures early in CI
PlatformCommon differencesMitigation
SnowflakeRegex dialect / case sensitivityFollow REGEXP-family semantics; validate with small tests
BigQueryDate arithmetic and typesSpell out DATE/TIMESTAMP functions; consider SAFE_CAST
Spark/DatabricksBehavioral differences in some functionsRefer to SQL ANSI mode and function-level docs

Condition pushdown for load reduction

SELECT ... FROM fact
WHERE event_date >= current_date - 7  -- Narrow the time window first
  AND partition_col = '2026-04-01'    -- Hit a partition

Conditional checks (when to use row_condition vs. where)

models:
  - name: fact_sessions
    tests:
      - dbt_expectations.expect_column_values_to_be_between:
          column: session_duration_sec
          min_value: 0
          row_condition: "event_date >= dateadd(day, -1, current_date)"  # Match your dialect

  - name: dim_user
    columns:
      - name: email
        tests:
          - dbt_expectations.expect_column_values_to_match_regex:
              regex: "^[^@\s]+@[^@\s]+\.[^@\s]+
NicheeLab を読み込み中…
quot; where: "is_test_user = false" # Some macros use row_condition instead of where

Synergy Between Model Contracts and Documentation

dbt model contracts can enforce column names, types, and nullability at build time (official: Contracts https://docs.getdbt.com/docs/build/schemas#enforcing-model-contracts). Tests guarantee data content health; contracts guarantee schema shape. Used together, they cover both API-style compatibility and data quality.

The exam tends to ask which requirements belong to contracts and which belong to tests. Map types, required columns, and naming to contracts; map distributions, statistics, and relationships to tests.

  • Contracts produce build-time errors; tests run after the build
  • API compatibility → contracts; business quality → tests
  • Generate documentation automatically from description/properties (docs generate)
MechanismStrengthBest-fit requirements
contractsImmediate schema enforcementTypes / required columns / naming / public interfaces
Built-in testsComprehensive baseline qualityNULL / uniqueness / referential integrity / accepted values
dbt-expectationsAdvanced content validationDistributions / ranges / formats / composite checks

When contracts and tests apply

dbt run (build)
  |-- contracts enforce (schema)
  `-- models built
      `-- dbt test (content validation)

Apply contracts to a model while also using tests

models:
  - name: fct_orders
    config:
      contract: {enforced: true}
    columns:
      - name: order_id
        data_type: integer
        constraints:
          nullable: false
        tests:
          - unique
      - name: order_amount
        data_type: numeric
        tests:
          - dbt_expectations.expect_column_values_to_be_between:
              min_value: 0
              strictly: true

Team Operations: Rules, CI, and Tag Design

Encode quality rules into tags and selectors, then run a three-stage cadence: diff-only on PR, full run on main, and extended checks on schedule. This balances speed and quality. Agree up front on the store_failures schema, retention period, and notification path (CI artifact / BI dashboard).

Adopt dbt-expectations gradually. Lock down P0 with built-in tests first, then add extended tests to critical domains, and finally roll out across the codebase. This minimizes migration cost.

  • Tags: p0 (mandatory) / p1 (recommended) / monitor (health)
  • Define CI presets in selectors.yml
  • Run full on main; use state:modified+ on PR to shorten the diff
  • Clean up failing-row tables daily (manage retention)
EnvironmentRecommended command / selectorPurpose
Development (local)dbt test -s model_name+Targeted, fast validation
PR(CI)dbt test -s state:modified+ tag:p0Catch failures fast on diff + P0 only
Production scheduledbt test -s tag:p0,tag:monitorQuality assurance plus health monitoring

Staged test-execution strategy

PRmodified+ & p0mainall p0dailyp0+monitorStaged test-execution strategy

Example selectors.yml (for CI)

selectors:
  - name: pr_critical
    definition:
      union:
        - method: state
          value: modified+
        - method: tag
          value: p0
  - name: daily_quality
    definition:
      union:
        - method: tag
          value: p0
        - method: tag
          value: monitor

# Example execution
# dbt test -s @pr_critical
# dbt test -s @daily_quality

Check with a Practice Question

Analytics Engineer

問題 1

On a sales fact model, you want to strictly enforce composite uniqueness on (order_id, line_number) as an error, while monitoring that amount is non-negative as a warning. Without writing custom SQL, which YAML approach is best?

  1. Define dbt-expectations' unique_combination_of_columns with severity=error and expect_column_values_to_be_between(min=0) with severity=warn
  2. Write two custom singular SQL data tests and fail the job on both
  3. Set only model contracts and skip content validation
  4. Run Great Expectations in a separate job and skip dbt tests

正解: A

The requirement can be expressed declaratively in YAML; composite uniqueness is a hallmark dbt-expectations macro, and severity captures the difference in importance. Hand-written SQL or a separate tool would be overkill here.

Frequently Asked Questions

Is dbt-expectations an official feature? Is it required for the exam?

It is a community package. The exam focuses on official testing concepts (generic/singular tests, severity, selectors, etc.) and does not require memorizing a specific package's API. That said, knowing when to reach for extended tests is useful (official: https://docs.getdbt.com/, certification: https://www.getdbt.com/certifications/analytics-engineer-certification-exam).

What are practical tactics when tests are heavy on large tables?

Practical options include using row_condition/where to narrow the time range or rows, filtering on clustering/partition keys first, splitting low-frequency checks into a monitor tag, using store_failures to localize root causes, and using a state:modified+ selector to keep dev and CI load down.

Should I prioritize contracts or tests?

They serve different purposes. Use contracts to enforce schema compatibility (types, required columns) at build time, and use tests to continuously validate data content quality (uniqueness, distribution, ranges, formats, foreign-key integrity). A two-layered approach is recommended.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
dbt

dbt Models: SQL-Defined Transformation Units (2026)

Model fundamentals — SELECT-based definitions, naming, refs,...

dbt

dbt Analytics Engineering Exam: Complete Guide (2026)

Pass the AE Certification — scope, weighting, sample questio...

dbt

dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)

Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...

dbt

dbt Project Structure: models/seeds/macros Layout (2026)

Recommended dbt project layout — models, seeds, macros, snap...

dbt

dbt_project.yml Explained: Every Config (2026)

Every dbt_project.yml setting that matters — paths, vars, ma...

Browse all dbt articles (101)
© 2026 NicheeLab All rights reserved.