Terraform Drift Detection: Continuous Monitoring (2026)

This article is a practical guide to operationalizing Terraform drift detection through scheduled plan runs. By detecting diffs without making changes, and automating failure criteria and notifications, you can prevent unexpected production changes.

It covers features that are stable in HashiCorp's official documentation — the plan refresh phase, -detailed-exitcode, -refresh-only, remote backend locking, and Terraform Cloud workspace execution and notifications — including points that come up frequently on certification exams.

Drift-Detection Basics and the Goal of Scheduled plan Runs

Drift is when the configuration intended by your Terraform code and state no longer matches the real infrastructure (the actual entities behind cloud APIs). Common causes are manual changes, updates by other tools, and side effects of auto-scaling or rotation.

The goal of scheduled plan runs is to check for diffs without changing infrastructure and to notify based on a threshold (for example: any diff = pipeline failure). Daily or hourly checks let you act on early warning signs.

Make non-mutating detection the rule (plan only, never apply)
Track both state-vs-reality and code-vs-reality diffs
Treat detection outcomes as machine-readable exit codes wired directly into notifications

How plan Works and Key Flags: -detailed-exitcode and -refresh-only

terraform plan reads the real infrastructure state in a refresh phase first, brings state up to date, and then evaluates diffs. This is how external updates such as manual changes get surfaced.

-detailed-exitcode is the essential flag for returning diff status as an exit code. It enables pipeline branching (0 = no diff, 2 = diff). -refresh-only is a dedicated mode that proposes only state updates without changing real infrastructure — a great fit for non-mutating drift detection.

Exit codes for terraform plan -detailed-exitcode: 0 = no diff, 1 = error, 2 = diff present
-refresh-only shows only “proposals to align state with reality” — it never proposes resource changes
If you are doing drift detection only, the safe combination is -refresh-only -detailed-exitcode

Implementing “Daily plan” with OSS + CI/Cron

The minimal setup: a scheduler (for example GitHub Actions schedule or Jenkins cron) runs terraform plan -refresh-only -detailed-exitcode on a schedule, and when exit code 2 is detected, it notifies and opens a ticket. Give the execution environment read credentials (read + record least-privilege as needed).

Use a remote backend (such as Terraform Cloud/Enterprise or S3 + DynamoDB lock) to guarantee locking and consistency. For multiple workspaces, tune parallelism to stay within API rate limits and budget.

Prefer short-lived credentials (OIDC, STS, etc.)
Split jobs per workspace and surface failures right at the source
Treat “diff present (exit 2)” as a successful detection — mark it as failure and notify

Approach	What it detects	Side effects (infra changes)	Notification & integration
CLI: plan -refresh-only -detailed-exitcode	State vs reality diff (external changes)	None (only proposes state updates)	Fail via CI exit code → Slack / issue integration
CLI: normal plan (-detailed-exitcode only)	Code vs reality diff (changes that should be applied)	None (proposals only)	Detect diff and route to review
Terraform Cloud drift detection / scheduled runs	Workspace-level diff detection	None (detection-only evaluation run)	Visualized via notification channels and UI (check availability terms)

Drift detection via scheduled plan (conceptual diagram)

Example: -refresh-only drift detection with GitHub Actions (schedule)

name: Drift Detection (Nightly)

on:
  schedule:
    - cron: '0 2 * * *'  # UTC 02:00
  workflow_dispatch:

jobs:
  plan-refresh-only:
    runs-on: ubuntu-latest
    permissions:
      id-token: write   # Federate to cloud via OIDC
      contents: read
    steps:
      - uses: actions/checkout@v4
      - name: Set up Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.x
      - name: Cloud credentials (example)
        run: |
          echo "Obtain short-lived credentials here (e.g. AWS STS / GCP Workload Identity)"
      - name: Terraform init (remote backend)
        run: terraform init -input=false
      - name: Drift detection (refresh-only)
        id: plan
        run: |
          set -e
          terraform plan -refresh-only -detailed-exitcode -no-color || RC=$?
          if [ "${RC}" = "2" ]; then
            echo "DRIFT=true" >> $GITHUB_OUTPUT
            exit 2
          elif [ -n "${RC}" ] && [ "${RC}" != "0" ]; then
            echo "Terraform error (RC=${RC})" >&2
            exit ${RC}
          fi
      - name: Notify on drift
        if: failure() && steps.plan.outputs.DRIFT == 'true'
        run: |
          echo "Drift detected. Post to Slack / create issue here."

Detection and Notification with Terraform Cloud/Enterprise

Workspace execution in Terraform Cloud/Enterprise gives you remote state management, locking, permission separation, and notification-channel integration in one place. Drift detection (refresh-based evaluation runs) plus schedule/policy integrations let you push detection and visualization onto the platform itself (check the official docs for availability and details).

Key design points: separate detection runs from change-applying runs, define clear alert operations on notification channels (email, webhook, etc.), and when needed, install guardrails via policy (Sentinel and similar) — for example, “block apply when manual changes are detected.”

Manage permissions, variables, and notification channels per workspace
Default to detection-only runs (no apply) to prevent operational mis-applies
In large environments, use cross-cutting visibility by org, project, and tags

Operational Pitfalls and Guardrails

Enforce least privilege and short lifetimes for credentials. Detection alone is mostly read-only, but some backend types require state writes and lock acquisition — so design permissions to match the backend's requirements.

To manage API rate limits and execution cost, control parallelism (-parallelism), batch-split workspaces, and prioritize stacks that change dynamically. To reduce noise, use lifecycle's ignore_changes appropriately and keep volatile elements (timestamps, random values) out of the plan.

Use a remote backend with locking enabled (to guarantee consistency)
Reduce output noise (-no-color, log summaries, notify diffs only)
Operate with a clear distinction between error (exit code 1) and drift (exit code 2)
Isolate frequently-changing resources via ignore_changes or a separate workspace

Exam Prep: Frequently Tested Points (Pro Level)

Foundational questions cover plan's prior refresh, the meaning of -detailed-exitcode, and when to use -refresh-only vs normal plan. Backend locking, workspace design, Sentinel/policy integration, and least privilege for service accounts also come up often.

Scenario questions present requirements like “regularly detect diffs only, never modify infrastructure” and “notify on diff and route to manual review.” The model answer is to choose plan -refresh-only -detailed-exitcode and trigger notifications on exit code 2.

Memorize the meaning of exit codes 0/1/2
-refresh-only can be used safely as a “detection-only” mode
Remote backend + locking is mandatory; manage parallelism and rate limits
Separate workspaces / variables / notifications and apply least privilege
Noise control (ignore_changes, diff-only notifications)

Check Your Understanding

Pro

問題 1

In production, you want to detect “drift caused by external changes” every night without modifying any resources, and fail and notify the pipeline if a diff exists. Which CLI invocation is most appropriate?

terraform plan -refresh-only -detailed-exitcode
terraform plan -detailed-exitcode -refresh=false
terraform apply -refresh-only -auto-approve
Manually compare the output of terraform state pull

正解: A

-refresh-only fits the requirement of detecting drift without modifying infrastructure. -detailed-exitcode returns exit code 2 when a diff exists, so the pipeline can treat it as failure. -refresh=false does not bring in external changes and causes false negatives — inappropriate. apply may make changes, which violates the requirement. Manual comparison of state pull output does not align with automation.

Frequently Asked Questions

Drift showed up in refresh-only. Should I apply right away?

Identify the root cause first. If the manual change is legitimate, reflect it in code (or reconsider ignore_changes), then reconcile with a normal plan/apply. To align state with reality only, apply -refresh-only updates state alone — but always run reviews and approvals per your operational rules.

Are changes to data sources (data blocks) detected as drift?

data blocks are not managed resources — they are re-read on each evaluation. Drift detection primarily targets diffs in managed resources, so changes to data values themselves are not treated as drift. If a data change affects a managed resource, that diff will surface in the plan.

How do you reduce drift-detection noise in large environments?

Combine techniques: apply ignore_changes, isolate resources that change often, split schedules (high frequency for critical stacks, lower for others), tune parallelism and retries, and notify only with diff summaries (details go to logs). For spots that perpetually drift, revisit design, permissions, and automation.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる

Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.

Terraform Drift Detection in Practice: Catch Unintended Changes with Scheduled plan Runs