dbt

dbt Data Contracts in Practice: Enforce Model Contracts to Protect Schemas

2026-04-19
NicheeLab Editorial Team

Have you ever discovered that the column names, order, or types in a data model silently drifted? dbt Data Contracts (model contracts) prevent exactly that. Enable a contract on a model, and the compiled SQL explicitly pins down the columns and types, catching unexpected columns or type drift early.

This article walks through how to enable contracts, the concrete behavior after compilation, how they combine with incremental models, how they differ from tests and DB constraints, and the points most often asked on the exam — all based on stable behavior aligned with the official docs.

What Data Contracts Are and Why You Enforce Them

Data Contracts let you declare a dbt model's output schema (column names, order, data types) and enforce it at runtime. When enforced, the compiled SQL selects only the declared columns and applies explicit type casts as needed. This stops upstream-driven surprises like new columns appearing or types changing at the model boundary.

From the Analytics Engineer exam perspective, frequent topics include how to enable a contract, the requirement to declare columns, the adapter-dependent nature of type specification, and how to handle schema changes on incremental models (on_schema_change and full refresh).

  • Clear boundaries: codify the model's output 'promises' so they cannot be broken
  • Fail fast: detect unexpected columns, missing columns, and type mismatches at runtime
  • Localized impact: even if columns are added upstream, the downstream contract stays stable

Enabling Contracts and Declaration Requirements

Contracts are enabled per model. Set contract.enforced to true in the YAML or model SQL config, and declare every column under columns. When enforcing the contract, you must specify each column's data_type using the right type for your adapter (e.g. BigQuery, Snowflake, Postgres/Redshift).

Column renames or aliases require that the SQL output column name matches columns.name in YAML. Undeclared columns will not appear in the output, and declared columns that are not present in the SQL will cause a runtime error.

  • Declare every column under columns in YAML and specify a data_type for each
  • Make the SQL output column names match columns.name in YAML
  • select * is allowed, but understand that new upstream columns not in the contract are silently dropped
  • Missing columns cause a runtime error — useful for early detection

Minimal contract implementation (YAML + model SQL)

# schema.yml
models:
  - name: orders_mart
    config:
      contract:
        enforced: true
    columns:
      - name: order_id
        data_type: INT64        # e.g. BigQuery
      - name: customer_id
        data_type: INT64
      - name: order_total
        data_type: NUMERIC
      - name: order_ts
        data_type: TIMESTAMP

---
-- models/orders_mart.sql
{{ config(materialized='table', contract={'enforced': true}) }}
with src as (
  select * from {{ ref('stg_orders') }}
)
select
  order_id,
  customer_id,
  order_total,
  order_ts
from src

Understanding the Actual Behavior After Compilation

When the contract is enforced, dbt wraps the original query in an internal CTE and the outer query selects only the declared columns in the declared order and type. The outer SELECT applies implicit/explicit casts and drops undeclared columns. Declared columns that are missing from the inner query fail at the database as undefined column references.

Adapters differ in detail, but the common theme is: the outer SELECT pins columns, order, and types. Even for views, column types depend on the SELECT list, so the cast fixes the types.

  • Extra columns are dropped by the outer SELECT
  • Missing columns fail at runtime as 'column not found'
  • Type mismatches are corrected by the cast in the outer SELECT (failing if the cast is impossible)

What enforcement looks like at compile time (simplified SQL)

WITH model__dbt_internal AS (
  -- Original query (may be affected by upstream changes)
  SELECT * FROM stg_orders
)
SELECT
  CAST(order_id    AS INT64)     AS order_id,
  CAST(customer_id AS INT64)     AS customer_id,
  CAST(order_total AS NUMERIC)   AS order_total,
  CAST(order_ts    AS TIMESTAMP) AS order_ts
FROM model__dbt_internal

Incremental Models and Handling Schema Changes

Contracts also work on incremental materializations. However, physical schema changes to existing tables are a separate problem. Contracts logically pin output columns and types, but actual column additions or type changes to the underlying table must be combined with on_schema_change or a full refresh.

Here are the typical operational patterns. When the change is large or involves type changes, a full refresh is the safest path.

  • Adding a new column: add it to YAML with a data_type. For incremental, consider on_schema_change=append_new_columns (supported adapters only). If unsupported or types are strict, do a full refresh.
  • Dropping/renaming columns: dbt generally does not drop automatically. Sync the SQL and YAML, then full refresh if necessary.
  • Type changes: existing data is likely to become inconsistent. Add a new column, migrate, retire the old one, or do a full refresh.
  • dbt build order: contracts are enforced during run/model creation; tests run afterwards.

Contracts vs Tests vs Database Constraints

Even though all three are about 'quality', contracts, tests, and DB constraints serve different roles. Contracts pin the schema boundary, tests verify data properties, and DB constraints enforce storage-level consistency. Knowing when to use each makes design intent clear.

  • Contracts keep the 'output column promise' intact
  • Tests verify content (nullability, uniqueness, referential integrity)
  • DB constraints are the final storage-level line of defense, and their effectiveness varies by adapter
ItemPurpose / TargetFailure TimingTypical Configuration / Example
Data ContractsPin output schema (columns, order, type)At model execution (detected by the compiled SELECT)config: contract.enforced=true plus data_type on YAML columns
dbt testsData properties like NOT NULL, UNIQUE, relationshipsdbt test (or the test phase of dbt build)tests: not_null, unique, relationships, etc.
DB constraintsStorage consistency such as NOT NULL, PK, FK, CHECKAt DML/DDL time (depends on adapter implementation)Constraint support is adapter-dependent (e.g. NOT NULL, etc.)

Pitfalls and Checklist

To avoid common implementation pitfalls, run through this pre-shipping checklist.

  • Are there any undeclared columns in columns? Does every column have a data_type?
  • Do the SQL output column names match columns.name in YAML?
  • Is an on_schema_change policy in place for adding new columns to incremental models?
  • If a type change is needed, is there a full-refresh plan?
  • Can dbt compile and dbt run (or dbt build) succeed in a production-like environment?

Check Your Understanding

Analytics Engineer

問題 1

You want to pin the column names, order, and types of a model delivered downstream so that new upstream columns do not break the downstream schema, and you want missing columns or type mismatches to fail at model execution. Which approach is most appropriate?

  1. Set contract.enforced=true on the model and declare all columns and data_types in YAML
  2. Only add tests like not_null and unique
  3. Add more database indexes to stabilize the execution plan
  4. Always use select * in the model SQL to absorb upstream column additions

正解: A

Enforcing the contract makes the compiled outer SELECT emit only the declared columns and types. Undeclared columns are dropped, and missing columns or type mismatches fail at runtime. Tests verify content; they do not pin the schema.

Frequently Asked Questions

What can contracts be applied to? Do they work on sources or snapshots?

Contracts apply to dbt models. Sources are declarations of external relations and are not subject to contracts. Snapshots are a history-tracking mechanism and are not the primary target of contracts either.

How do you specify data_type in a multi-adapter environment?

data_type is adapter-specific. When environments differ, use env vars, Jinja conditionals, or macros for type aliases to emit the right type name per adapter (e.g. INT64/STRING, NUMBER/VARCHAR).

What happens when a new column is added upstream?

If the contract is enforced, the new column will not appear in the output (the outer SELECT drops it). To bring it in, update the contract by adding the column to YAML with a data_type, and consider on_schema_change for incremental models or a full refresh if needed.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
dbt

dbt Models: SQL-Defined Transformation Units (2026)

Model fundamentals — SELECT-based definitions, naming, refs,...

dbt

dbt Analytics Engineering Exam: Complete Guide (2026)

Pass the AE Certification — scope, weighting, sample questio...

dbt

dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)

Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...

dbt

dbt Project Structure: models/seeds/macros Layout (2026)

Recommended dbt project layout — models, seeds, macros, snap...

dbt

dbt_project.yml Explained: Every Config (2026)

Every dbt_project.yml setting that matters — paths, vars, ma...

Browse all dbt articles (101)
© 2026 NicheeLab All rights reserved.