dbt lets you declare tests in YAML, then compiles them into adapter-specific SQL (Snowflake, BigQuery, Redshift, Spark, etc.) for execution (official: https://docs.getdbt.com/). dbt-expectations is a community package built on top of that mechanism, providing a rich set of validation macros similar to Great Expectations.
This article covers the angles the exam tends to probe — choosing, configuring, and interpreting tests — alongside ready-to-use YAML examples and operational patterns. Because the package's specific API may change, the explanations are grounded in dbt's stable concepts (tests, selectors, severity, store_failures, contracts, etc.).
dbt tests come in two flavors: generic tests (declared in schema.yml) and singular data tests (arbitrary SQL). dbt-expectations extends the former in a Great Expectations style, letting you invoke many validation macros concisely from YAML. The exam tends to ask whether you can choose between built-in tests (not_null, unique, relationships, accepted_values, etc.) and extended tests appropriately, how to use severity, and how to interpret results (error vs. warn, store_failures) (official: Tests basics https://docs.getdbt.com/docs/build/tests).
The key insight: tests compile to SQL and run against the target model or source. The declaration is declarative, but the underlying execution is set-based computation on the adapter. Evaluate test load, index usage, and date filters from the perspective of whether the resulting SQL is realistic.
| Aspect | dbt built-in tests | dbt-expectations | Great Expectations (reference) |
|---|---|---|---|
| How to define | Concise entries in schema.yml's tests block | Extended macros referenced from schema.yml | Expectations written in Python/YAML |
| Strengths | Solid coverage of basic constraints | Diverse statistical, distribution, and pattern checks | Extremely expressive |
| Operational overhead | Minimal | Small to moderate (package dependency) | Moderate (separate runtime/orchestration) |
Conceptual flow of dbt test compilation and execution
Add dbt-expectations to packages.yml
packages:
- package: calogica/dbt-expectations
version: ">=0.10.0,<1.0.0" # Pin the version to match your project's compatibility
# Fetch dependencies after adding
# dbt depsWith dbt-expectations, you can declare checks for nulls, duplicates, ranges, patterns, referential integrity, and distributions directly in YAML. Things the built-ins don't cover well — composite uniqueness, non-negative values within a range, regex patterns, and row-level conditional checks — become easy to express.
Below is an example for the orders model. From an exam perspective, you need to quickly read the syntax of the tests section, the severity setting, and whether where/row_condition filters are used.
| Check | Typical use | Underlying SQL concept |
|---|---|---|
| unique_combination_of_columns | Detect natural-key duplicates | GROUP BY + HAVING COUNT(*)>1 |
| expect_column_values_to_be_between | Threshold check | BETWEEN / comparison operators |
| expect_column_values_to_match_regex | Format check for IDs / emails | REGEXP-family operators |
Model → test macro → SQL (simplified)
Example test definitions in schema.yml (dbt-expectations)
version: 2
models:
- name: orders
description: "Normalized order fact"
columns:
- name: order_id
tests:
- not_null
- unique
- name: customer_email
tests:
- dbt_expectations.expect_column_values_to_match_regex:
regex: "^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}
NicheeLab を読み込み中…
quot; severity: warn - name: order_amount tests: - dbt_expectations.expect_column_values_to_be_between: min_value: 0 strictly: true severity: error tests: - dbt_expectations.unique_combination_of_columns: combination_of_columns: [order_id, line_number] severity: error - dbt_expectations.expect_table_row_count_to_be_between: min_value: 1 row_condition: "order_date >= dateadd(day, -7, current_date)" # Match your adapter's SQL dialect severity: warn
dbt test supports selection syntax to narrow down what runs. Combine tags, model names, and paths to keep local runs light while making sure CI reliably executes the critical tests (official: Selection syntax). severity=warn does not mark the job as failed, but the result is still recorded. store_failures persists failing rows in a table so you can investigate root causes easily (official: Test configs https://docs.getdbt.com/docs/build/tests#test-configs).
In practice, split P0 quality checks (key uniqueness, not-null, referential integrity) at severity=error from health monitors (distributions, trends, minor format checks) at severity=warn, and make both visible in CI.
| Setting / execution | Effect | Caveats |
|---|---|---|
| severity: error | Failure fails the job | Reserve for P0 checks |
| severity: warn | Job continues even on failure | Visibility is mandatory |
| store_failures: true | Failing rows become investigatable | Decide schema cleanup and retention period |
Selective test-execution pattern
Configure store_failures, tags, and selectors in dbt_project.yml
name: your_project
version: 1.0.0
config-version: 2
tests:
+store_failures: true
+severity: error # Default, overridden per test
selectors:
- name: critical_tests
definition:
method: fqn
value: tag:p0
# Example execution
# dbt test -s @critical_testsdbt tests ultimately become adapter-specific SQL. Regex, date functions, and approximate aggregations have dialect differences. Many dbt-expectations macros abstract those differences, but anywhere you pass raw SQL — row_condition, regex character classes, etc. — you need to match your environment's dialect.
On large tables, push partition or date conditions to the front and confirm the query shape lets indexes and clustering do their job.
| Platform | Common differences | Mitigation |
|---|---|---|
| Snowflake | Regex dialect / case sensitivity | Follow REGEXP-family semantics; validate with small tests |
| BigQuery | Date arithmetic and types | Spell out DATE/TIMESTAMP functions; consider SAFE_CAST |
| Spark/Databricks | Behavioral differences in some functions | Refer to SQL ANSI mode and function-level docs |
Condition pushdown for load reduction
SELECT ... FROM fact
WHERE event_date >= current_date - 7 -- Narrow the time window first
AND partition_col = '2026-04-01' -- Hit a partitionConditional checks (when to use row_condition vs. where)
models:
- name: fact_sessions
tests:
- dbt_expectations.expect_column_values_to_be_between:
column: session_duration_sec
min_value: 0
row_condition: "event_date >= dateadd(day, -1, current_date)" # Match your dialect
- name: dim_user
columns:
- name: email
tests:
- dbt_expectations.expect_column_values_to_match_regex:
regex: "^[^@\s]+@[^@\s]+\.[^@\s]+
NicheeLab を読み込み中…
quot;
where: "is_test_user = false" # Some macros use row_condition instead of wheredbt model contracts can enforce column names, types, and nullability at build time (official: Contracts https://docs.getdbt.com/docs/build/schemas#enforcing-model-contracts). Tests guarantee data content health; contracts guarantee schema shape. Used together, they cover both API-style compatibility and data quality.
The exam tends to ask which requirements belong to contracts and which belong to tests. Map types, required columns, and naming to contracts; map distributions, statistics, and relationships to tests.
| Mechanism | Strength | Best-fit requirements |
|---|---|---|
| contracts | Immediate schema enforcement | Types / required columns / naming / public interfaces |
| Built-in tests | Comprehensive baseline quality | NULL / uniqueness / referential integrity / accepted values |
| dbt-expectations | Advanced content validation | Distributions / ranges / formats / composite checks |
When contracts and tests apply
dbt run (build)
|-- contracts enforce (schema)
`-- models built
`-- dbt test (content validation)Apply contracts to a model while also using tests
models:
- name: fct_orders
config:
contract: {enforced: true}
columns:
- name: order_id
data_type: integer
constraints:
nullable: false
tests:
- unique
- name: order_amount
data_type: numeric
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
strictly: trueEncode quality rules into tags and selectors, then run a three-stage cadence: diff-only on PR, full run on main, and extended checks on schedule. This balances speed and quality. Agree up front on the store_failures schema, retention period, and notification path (CI artifact / BI dashboard).
Adopt dbt-expectations gradually. Lock down P0 with built-in tests first, then add extended tests to critical domains, and finally roll out across the codebase. This minimizes migration cost.
| Environment | Recommended command / selector | Purpose |
|---|---|---|
| Development (local) | dbt test -s model_name+ | Targeted, fast validation |
| PR(CI) | dbt test -s state:modified+ tag:p0 | Catch failures fast on diff + P0 only |
| Production schedule | dbt test -s tag:p0,tag:monitor | Quality assurance plus health monitoring |
Staged test-execution strategy
Example selectors.yml (for CI)
selectors:
- name: pr_critical
definition:
union:
- method: state
value: modified+
- method: tag
value: p0
- name: daily_quality
definition:
union:
- method: tag
value: p0
- method: tag
value: monitor
# Example execution
# dbt test -s @pr_critical
# dbt test -s @daily_qualityAnalytics Engineer
問題 1
On a sales fact model, you want to strictly enforce composite uniqueness on (order_id, line_number) as an error, while monitoring that amount is non-negative as a warning. Without writing custom SQL, which YAML approach is best?
正解: A
The requirement can be expressed declaratively in YAML; composite uniqueness is a hallmark dbt-expectations macro, and severity captures the difference in importance. Hand-written SQL or a separate tool would be overkill here.
Is dbt-expectations an official feature? Is it required for the exam?
It is a community package. The exam focuses on official testing concepts (generic/singular tests, severity, selectors, etc.) and does not require memorizing a specific package's API. That said, knowing when to reach for extended tests is useful (official: https://docs.getdbt.com/, certification: https://www.getdbt.com/certifications/analytics-engineer-certification-exam).
What are practical tactics when tests are heavy on large tables?
Practical options include using row_condition/where to narrow the time range or rows, filtering on clustering/partition keys first, splitting low-frequency checks into a monitor tag, using store_failures to localize root causes, and using a state:modified+ selector to keep dev and CI load down.
Should I prioritize contracts or tests?
They serve different purposes. Use contracts to enforce schema compatibility (types, required columns) at build time, and use tests to continuously validate data content quality (uniqueness, distribution, ranges, formats, foreign-key integrity). A two-layered approach is recommended.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
dbt Models: SQL-Defined Transformation Units (2026)
Model fundamentals — SELECT-based definitions, naming, refs,...
dbt Analytics Engineering Exam: Complete Guide (2026)
Pass the AE Certification — scope, weighting, sample questio...
dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)
Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...
dbt Project Structure: models/seeds/macros Layout (2026)
Recommended dbt project layout — models, seeds, macros, snap...
dbt_project.yml Explained: Every Config (2026)
Every dbt_project.yml setting that matters — paths, vars, ma...