dbt_expectations Package: Great Expectations

dbt lets you declare tests in YAML, then compiles them into adapter-specific SQL (Snowflake, BigQuery, Redshift, Spark, etc.) for execution (official: https://docs.getdbt.com/). dbt-expectations is a community package built on top of that mechanism, providing a rich set of validation macros similar to Great Expectations.

This article covers the angles the exam tends to probe — choosing, configuring, and interpreting tests — alongside ready-to-use YAML examples and operational patterns. Because the package's specific API may change, the explanations are grounded in dbt's stable concepts (tests, selectors, severity, store_failures, contracts, etc.).

Where dbt-expectations Fits and What the Exam Tends to Ask

dbt tests come in two flavors: generic tests (declared in schema.yml) and singular data tests (arbitrary SQL). dbt-expectations extends the former in a Great Expectations style, letting you invoke many validation macros concisely from YAML. The exam tends to ask whether you can choose between built-in tests (not_null, unique, relationships, accepted_values, etc.) and extended tests appropriately, how to use severity, and how to interpret results (error vs. warn, store_failures) (official: Tests basics https://docs.getdbt.com/docs/build/tests).

The key insight: tests compile to SQL and run against the target model or source. The declaration is declarative, but the underlying execution is set-based computation on the adapter. Evaluate test load, index usage, and date filters from the perspective of whether the resulting SQL is realistic.

Prefer generic tests; write raw SQL only as a last resort (for readability and reuse)
Use severity=error for critical quality checks, severity=warn for non-functional monitors
On large tables, control load with date partitioning or conditional clauses
In CI, use selectors to run only the critical tests in a short time window (official: Selection https://docs.getdbt.com/reference/node-selection/syntax)

Aspect	dbt built-in tests	dbt-expectations	Great Expectations (reference)
How to define	Concise entries in schema.yml's tests block	Extended macros referenced from schema.yml	Expectations written in Python/YAML
Strengths	Solid coverage of basic constraints	Diverse statistical, distribution, and pattern checks	Extremely expressive
Operational overhead	Minimal	Small to moderate (package dependency)	Moderate (separate runtime/orchestration)

Conceptual flow of dbt test compilation and execution

Add dbt-expectations to packages.yml

packages:
  - package: calogica/dbt-expectations
    version: ">=0.10.0,<1.0.0"  # Pin the version to match your project's compatibility

# Fetch dependencies after adding
# dbt deps

Common Validation Patterns and YAML Examples

With dbt-expectations, you can declare checks for nulls, duplicates, ranges, patterns, referential integrity, and distributions directly in YAML. Things the built-ins don't cover well — composite uniqueness, non-negative values within a range, regex patterns, and row-level conditional checks — become easy to express.

Below is an example for the orders model. From an exam perspective, you need to quickly read the syntax of the tests section, the severity setting, and whether where/row_condition filters are used.

Composite key uniqueness: unique_combination_of_columns
Value ranges / non-negative: expect_column_values_to_be_between, expect_column_values_to_be_increasing
Category coverage: expect_column_values_to_be_in_set / not_in_set
Regex patterns: expect_column_values_to_match_regex
Missing values / FK integrity: expect_column_values_to_not_be_null, extended relationships

Check	Typical use	Underlying SQL concept
unique_combination_of_columns	Detect natural-key duplicates	GROUP BY + HAVING COUNT(*)>1
expect_column_values_to_be_between	Threshold check	BETWEEN / comparison operators
expect_column_values_to_match_regex	Format check for IDs / emails	REGEXP-family operators

Model → test macro → SQL (simplified)

Example test definitions in schema.yml (dbt-expectations)

version: 2
models:
  - name: orders
    description: "Normalized order fact"
    columns:
      - name: order_id
        tests:
          - not_null
          - unique
      - name: customer_email
        tests:
          - dbt_expectations.expect_column_values_to_match_regex:
              regex: "^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}
      
        
        NicheeLab を読み込み中…
      
    
  quot;
              severity: warn
      - name: order_amount
        tests:
          - dbt_expectations.expect_column_values_to_be_between:
              min_value: 0
              strictly: true
              severity: error
    tests:
      - dbt_expectations.unique_combination_of_columns:
          combination_of_columns: [order_id, line_number]
          severity: error
      - dbt_expectations.expect_table_row_count_to_be_between:
          min_value: 1
          row_condition: "order_date >= dateadd(day, -7, current_date)"  # Match your adapter's SQL dialect
          severity: warn

Execution and Results: Selectors, Severity, and Storing Failing Rows

dbt test supports selection syntax to narrow down what runs. Combine tags, model names, and paths to keep local runs light while making sure CI reliably executes the critical tests (official: Selection syntax). severity=warn does not mark the job as failed, but the result is still recorded. store_failures persists failing rows in a table so you can investigate root causes easily (official: Test configs https://docs.getdbt.com/docs/build/tests#test-configs).

In practice, split P0 quality checks (key uniqueness, not-null, referential integrity) at severity=error from health monitors (distributions, trends, minor format checks) at severity=warn, and make both visible in CI.

Local: dbt test -s orders+
CI critical: dbt test -s tag:p0
Comprehensive run: dbt test -m state:modified+ # only the diff
Store failing rows: tests: +store_failures: true (project- or resource-scoped)

Setting / execution	Effect	Caveats
severity: error	Failure fails the job	Reserve for P0 checks
severity: warn	Job continues even on failure	Visibility is mandatory
store_failures: true	Failing rows become investigatable	Decide schema cleanup and retention period

Selective test-execution pattern

Configure store_failures, tags, and selectors in dbt_project.yml

name: your_project
version: 1.0.0
config-version: 2

tests:
  +store_failures: true
  +severity: error  # Default, overridden per test

selectors:
  - name: critical_tests
    definition:
      method: fqn
      value: tag:p0

# Example execution
# dbt test -s @critical_tests

Adapter Differences and Performance Considerations

dbt tests ultimately become adapter-specific SQL. Regex, date functions, and approximate aggregations have dialect differences. Many dbt-expectations macros abstract those differences, but anywhere you pass raw SQL — row_condition, regex character classes, etc. — you need to match your environment's dialect.

On large tables, push partition or date conditions to the front and confirm the query shape lets indexes and clustering do their job.

Snowflake: watch function names like ILIKE/REGEXP and DATEADD/TO_DATE (official: https://docs.snowflake.com/)
BigQuery: watch REGEXP_CONTAINS/EXTRACT and certain type casts (official: https://docs.getdbt.com/docs/core/connect-data-platform/bigquery)
Spark/Databricks: verify regex and window-function compatibility (official: https://docs.databricks.com/)
row_condition and regex are dialect-dependent. Catch dialect-induced failures early in CI

Platform	Common differences	Mitigation
Snowflake	Regex dialect / case sensitivity	Follow REGEXP-family semantics; validate with small tests
BigQuery	Date arithmetic and types	Spell out DATE/TIMESTAMP functions; consider SAFE_CAST
Spark/Databricks	Behavioral differences in some functions	Refer to SQL ANSI mode and function-level docs

Condition pushdown for load reduction

SELECT ... FROM fact
WHERE event_date >= current_date - 7  -- Narrow the time window first
  AND partition_col = '2026-04-01'    -- Hit a partition

Conditional checks (when to use row_condition vs. where)

models:
  - name: fact_sessions
    tests:
      - dbt_expectations.expect_column_values_to_be_between:
          column: session_duration_sec
          min_value: 0
          row_condition: "event_date >= dateadd(day, -1, current_date)"  # Match your dialect

  - name: dim_user
    columns:
      - name: email
        tests:
          - dbt_expectations.expect_column_values_to_match_regex:
              regex: "^[^@\s]+@[^@\s]+\.[^@\s]+
      
        
        NicheeLab を読み込み中…
      
    
  quot;
              where: "is_test_user = false"  # Some macros use row_condition instead of where

Synergy Between Model Contracts and Documentation

dbt model contracts can enforce column names, types, and nullability at build time (official: Contracts https://docs.getdbt.com/docs/build/schemas#enforcing-model-contracts). Tests guarantee data content health; contracts guarantee schema shape. Used together, they cover both API-style compatibility and data quality.

The exam tends to ask which requirements belong to contracts and which belong to tests. Map types, required columns, and naming to contracts; map distributions, statistics, and relationships to tests.

Contracts produce build-time errors; tests run after the build
API compatibility → contracts; business quality → tests
Generate documentation automatically from description/properties (docs generate)

Mechanism	Strength	Best-fit requirements
contracts	Immediate schema enforcement	Types / required columns / naming / public interfaces
Built-in tests	Comprehensive baseline quality	NULL / uniqueness / referential integrity / accepted values
dbt-expectations	Advanced content validation	Distributions / ranges / formats / composite checks

When contracts and tests apply

dbt run (build)
  |-- contracts enforce (schema)
  `-- models built
      `-- dbt test (content validation)

Apply contracts to a model while also using tests

models:
  - name: fct_orders
    config:
      contract: {enforced: true}
    columns:
      - name: order_id
        data_type: integer
        constraints:
          nullable: false
        tests:
          - unique
      - name: order_amount
        data_type: numeric
        tests:
          - dbt_expectations.expect_column_values_to_be_between:
              min_value: 0
              strictly: true

Team Operations: Rules, CI, and Tag Design

Encode quality rules into tags and selectors, then run a three-stage cadence: diff-only on PR, full run on main, and extended checks on schedule. This balances speed and quality. Agree up front on the store_failures schema, retention period, and notification path (CI artifact / BI dashboard).

Adopt dbt-expectations gradually. Lock down P0 with built-in tests first, then add extended tests to critical domains, and finally roll out across the codebase. This minimizes migration cost.

Tags: p0 (mandatory) / p1 (recommended) / monitor (health)
Define CI presets in selectors.yml
Run full on main; use state:modified+ on PR to shorten the diff
Clean up failing-row tables daily (manage retention)

Environment	Recommended command / selector	Purpose
Development (local)	dbt test -s model_name+	Targeted, fast validation
PR(CI)	dbt test -s state:modified+ tag:p0	Catch failures fast on diff + P0 only
Production schedule	dbt test -s tag:p0,tag:monitor	Quality assurance plus health monitoring

Staged test-execution strategy

Example selectors.yml (for CI)

selectors:
  - name: pr_critical
    definition:
      union:
        - method: state
          value: modified+
        - method: tag
          value: p0
  - name: daily_quality
    definition:
      union:
        - method: tag
          value: p0
        - method: tag
          value: monitor

# Example execution
# dbt test -s @pr_critical
# dbt test -s @daily_quality

Check with a Practice Question

Analytics Engineer

問題 1

On a sales fact model, you want to strictly enforce composite uniqueness on (order_id, line_number) as an error, while monitoring that amount is non-negative as a warning. Without writing custom SQL, which YAML approach is best?

Define dbt-expectations' unique_combination_of_columns with severity=error and expect_column_values_to_be_between(min=0) with severity=warn
Write two custom singular SQL data tests and fail the job on both
Set only model contracts and skip content validation
Run Great Expectations in a separate job and skip dbt tests

正解: A

The requirement can be expressed declaratively in YAML; composite uniqueness is a hallmark dbt-expectations macro, and severity captures the difference in importance. Hand-written SQL or a separate tool would be overkill here.

Frequently Asked Questions

Is dbt-expectations an official feature? Is it required for the exam?

It is a community package. The exam focuses on official testing concepts (generic/singular tests, severity, selectors, etc.) and does not require memorizing a specific package's API. That said, knowing when to reach for extended tests is useful (official: https://docs.getdbt.com/, certification: https://www.getdbt.com/certifications/analytics-engineer-certification-exam).

What are practical tactics when tests are heavy on large tables?

Practical options include using row_condition/where to narrow the time range or rows, filtering on clustering/partition keys first, splitting low-frequency checks into a monitor tag, using store_failures to localize root causes, and using a state:modified+ selector to keep dev and CI load down.

Should I prioritize contracts or tests?

They serve different purposes. Use contracts to enforce schema compatibility (types, required columns) at build time, and use tests to continuously validate data content quality (uniqueness, distribution, ranges, formats, foreign-key integrity). A two-layered approach is recommended.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる

Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.

Rich Great Expectations-style Validation with dbt-expectations