dbt

dbt Custom Generic Tests: Building Reusable Data Quality Checks

2026-04-19
NicheeLab Editorial Team

A generic test is a reusable "validation macro" that you call from YAML. Build one and you can apply it across columns and models, which is a direct path to standardizing your data quality rules.

This article walks through the full picture: a minimal custom test, multi-relation patterns, the execution flow and storage of results, and best practices. We highlight the terminology distinctions and configuration nuances that exam questions love to target.

Custom Generic Test Basics and How They Differ from Built-in Tests

A dbt generic test is defined as a Jinja macro that produces SQL returning "only the failing rows" via SELECT. The defining trait is that you call it from YAML (schema.yml) with arguments, using the same mechanism as not_null or unique — you can build your own with that exact pattern.

By contrast, a singular test is a one-off SQL file under tests/. It's suited for one-shot checks rather than reuse. Exams frequently test the distinction between these two and how generic tests handle arguments and config.

  • Defined as a macro (Jinja), but executed via dbt test's test materialization
  • Generic tests are applied to models/columns from YAML; singular tests run as standalone SQL files
  • The return value is always "failing rows only". 0 rows passes; 1+ rows fails (warn/error depending on severity)
AspectGeneric testSingular test
Typical locationMacros under tests/generic/ or macros/SQL files under tests/
How it's invokedFrom schema.yml's tests section, with argumentsdbt test executes each file
ReusabilityHigh (generalized via arguments)Low (tends to be case-specific)
Built-in examplesunique, not_null, relationships, etc.None (you author them)
GranularityEither model-level or column-levelPer test file

Built-in not_null / unique tests called from schema.yml

models:\n  - name: fct_orders\n    columns:\n      - name: order_id\n        tests:\n          - not_null\n          - unique

Minimal Implementation: Build a Range Check Called accepted_range

Let's start by building accepted_range to verify "the value falls between a min and max". The key insight for a generic test macro is that it only needs to generate a SELECT that returns the failing rows.

Match your project's conventions for file placement. Putting it at tests/generic/accepted_range.sql makes it easy to spot, and keeps things tidy when you package the test for distribution.

  • Return value is failing rows only (0 rows when passing)
  • Arguments accept model (Relation), column_name (string), and any extra parameters
  • Reference the column with {{ column_name }} and the relation with {{ model }}

Minimal example: tests/generic/accepted_range.sql and the schema.yml call

{% test accepted_range(model, column_name, min_value, max_value, inclusive=True) %}\nselect\n  *\nfrom {{ model }}\nwhere\n  {% if inclusive %}\n    ({{ column_name }} < {{ min_value }} or {{ column_name }} > {{ max_value }})\n  {% else %}\n    ({{ column_name }} <= {{ min_value }} or {{ column_name }} >= {{ max_value }})\n  {% endif %}\n{% endtest %}\n\n# Schema.yml invocation example\nmodels:\n  - name: fct_orders\n    columns:\n      - name: amount\n        tests:\n          - accepted_range:\n              min_value: 0\n              max_value: 100000\n              inclusive: true

Splitting Arguments and Config Correctly

Pass the "variable logic" of a test via arguments, and control "runtime behavior" (warning level, storage, row filtering, etc.) via config. The YAML side can override config on a per-test-instance basis.

The configs you reach for most often are severity (warn/error), where (pre-filter target rows), limit (cap stored or logged volume), and store_failures (persist failing rows as a table). Rather than baking where conditions into the test SQL, specifying them via config keeps the test more reusable.

  • Arguments change the macro's logic (e.g., min_value, max_value)
  • Config controls peripheral evaluation/runtime behavior (e.g., severity, where, limit, store_failures)
  • Config can be set independently per call. Shared defaults can also be overridden from dbt_project.yml

YAML test config example: warn-level severity, target filtering, and persisting failed rows

models:\n  - name: fct_orders\n    columns:\n      - name: amount\n        tests:\n          - accepted_range:\n              min_value: 0\n              max_value: 100000\n              config:\n                severity: warn\n                where: "order_status != 'CANCELLED'"\n                limit: 100\n                store_failures: true

Writing Tests That Receive Multiple Relations

Like relationships, you can implement generic tests that take an additional referenced table. YAML passes ref() or source() into to:, and the test macro joins on it to detect orphaned keys.

On exams, expect questions on the fact that model is passed as a Relation and can be referenced as {{ model }} directly, and that to is likewise received as a Relation. They are not strings, so there's no need to call ref() again inside the macro.

  • Both model and to are Relations. Expand them directly in SQL as {{ model }} and {{ to }}
  • For foreign-key validation, mind NULL handling (NULLs are typically skipped)
  • Absorb dialect differences (case sensitivity, quoting) at the column-name/identifier layer

An exists_in test that takes 2 relations, plus the YAML

{% test exists_in(model, column_name, to, field) %}\nselect\n  src.{{ column_name }} as fk_value\nfrom {{ model }} as src\nleft join {{ to }} as tgt\n  on src.{{ column_name }} = tgt.{{ field }}\nwhere src.{{ column_name }} is not null\n  and tgt.{{ field }} is null\n{% endtest %}\n\n# Schema.yml invocation (column-level test)\nmodels:\n  - name: fct_orders\n    columns:\n      - name: customer_id\n        tests:\n          - exists_in:\n              to: ref('dim_customers')\n              field: id\n              config:\n                severity: error

Execution Flow, Performance, and Storing Failed Rows

dbt test compiles each generic test macro and ships a SELECT that returns failing rows to the target warehouse. 0 rows passes; 1+ rows fails. With severity=warn it logs a warning; with error it fails the job.

Setting store_failures: true persists failing rows as a table in a dedicated schema (by default a tests schema under the target schema). Combining where with limit is effective for both narrowing scope and capping how much you store.

  • Column tests over huge datasets are expensive. Narrow with where, or stage runs with scheduling
  • Use selective dbt test -s model_name: to limit scope and speed up the dev iteration loop
  • Tune concurrency against warehouse quotas and lock contention; dependency order is handled automatically

Generic test execution flow

schema.ymlGeneric test macroCompile to SELECT returning failing rows onlyTest materializationPass: 0 rows / Fail: 1+ rows triggers warn or error (store_failures: true creates a failed-rows table)

Commands for running and persisting failures

# Test by selecting a model\ndbt test -s fct_orders\n\n# Select by test name (you can include package names)\ndbt test -s test_type:accepted_range\n\n# Run with failed-row storage\ndbt test -s fct_orders --store-failures

Best Practices and Pitfalls

Names should be noun phrases that convey "what is being validated" rather than verbs, and arguments should be kept to a minimum. When you need dialect-specific functions inside a test, absorb the differences with adapter.dispatch.

Column-name quoting and case sensitivity are common pain points across dialects. Use adapter.quote where appropriate, or stick to lowercase ASCII identifiers with underscores. While NULL semantics rarely differ between RDBMSs, FK validation typically excludes NULL by design.

  • Keep tests deterministic (avoid non-deterministic functions and dependencies on current time)
  • Use where in YAML for filtering, keeping the macro itself generic
  • For external distribution, adopt package namespaces and semantic versioning

adapter.dispatch template for handling dialect differences (used inside a test)

{% macro cast_to_numeric(expr) %}\n  {{ adapter.dispatch('cast_to_numeric', 'my_project')(expr) }}\n{% endmacro %}\n\n{% macro my_project__cast_to_numeric(expr) %}\n  cast({{ expr }} as numeric)\n{% endmacro %}\n\n{% macro snowflake__cast_to_numeric(expr) %}\n  try_to_number({{ expr }})\n{% endmacro %}\n\n# Usage inside a test\n# where {{ cast_to_numeric(column_name) }} >= 0

Check Your Understanding

Analytics Engineer

問題 1

What does the model argument passed to a dbt custom generic test macro represent, and how do you reference it inside SQL?

  1. A. A runtime Relation object; reference it directly in SQL as {{ model }}
  2. B. A model-name string; you must call {{ ref(model) }} inside the SQL
  3. C. A dictionary that contains the column-name array; cannot be referenced directly from SQL
  4. D. An in-memory DataFrame; it runs without materialization

正解: A

The first argument of a generic test, model, is a Relation. In the test SQL you can expand the relation directly with {{ model }} — there's no need to call ref() again.

Frequently Asked Questions

Where should generic test macros be placed to be recognized?

Placing them under macros/ guarantees recognition. By convention some projects use tests/generic/, but as long as the file sits on the macro search path it will work. If you plan to ship the test as a package, keep the directory layout consistent.

How do I call generic tests from external packages (e.g., dbt-utils)?

After installing the package via packages.yml, reference the test from YAML's tests block as usual. Namespaces resolve automatically, so you can use it just like a local custom test (e.g., tests: - dbt_utils.expression_is_true: {expression: "amount > 0"}).

Can I control where failed rows are stored, including the schema?

Use store_failures: true to persist failed rows. The destination schema is configurable via project settings such as the tests schema setting in dbt_project.yml. Pair it with where and limit to keep storage volume in check.

Check what you learned with practice questions

Practice with certification-focused question sets

Try free practice questions
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
dbt

dbt Models: SQL-Defined Transformation Units (2026)

Model fundamentals — SELECT-based definitions, naming, refs,...

dbt

dbt Analytics Engineering Exam: Complete Guide (2026)

Pass the AE Certification — scope, weighting, sample questio...

dbt

dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)

Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...

dbt

dbt Project Structure: models/seeds/macros Layout (2026)

Recommended dbt project layout — models, seeds, macros, snap...

dbt

dbt_project.yml Explained: Every Config (2026)

Every dbt_project.yml setting that matters — paths, vars, ma...

Browse all dbt articles (101)
© 2026 NicheeLab All rights reserved.