This article is a practical guide to operationalizing Terraform drift detection through scheduled plan runs. By detecting diffs without making changes, and automating failure criteria and notifications, you can prevent unexpected production changes.
It covers features that are stable in HashiCorp's official documentation — the plan refresh phase, -detailed-exitcode, -refresh-only, remote backend locking, and Terraform Cloud workspace execution and notifications — including points that come up frequently on certification exams.
Drift is when the configuration intended by your Terraform code and state no longer matches the real infrastructure (the actual entities behind cloud APIs). Common causes are manual changes, updates by other tools, and side effects of auto-scaling or rotation.
The goal of scheduled plan runs is to check for diffs without changing infrastructure and to notify based on a threshold (for example: any diff = pipeline failure). Daily or hourly checks let you act on early warning signs.
terraform plan reads the real infrastructure state in a refresh phase first, brings state up to date, and then evaluates diffs. This is how external updates such as manual changes get surfaced.
-detailed-exitcode is the essential flag for returning diff status as an exit code. It enables pipeline branching (0 = no diff, 2 = diff). -refresh-only is a dedicated mode that proposes only state updates without changing real infrastructure — a great fit for non-mutating drift detection.
The minimal setup: a scheduler (for example GitHub Actions schedule or Jenkins cron) runs terraform plan -refresh-only -detailed-exitcode on a schedule, and when exit code 2 is detected, it notifies and opens a ticket. Give the execution environment read credentials (read + record least-privilege as needed).
Use a remote backend (such as Terraform Cloud/Enterprise or S3 + DynamoDB lock) to guarantee locking and consistency. For multiple workspaces, tune parallelism to stay within API rate limits and budget.
| Approach | What it detects | Side effects (infra changes) | Notification & integration |
|---|---|---|---|
| CLI: plan -refresh-only -detailed-exitcode | State vs reality diff (external changes) | None (only proposes state updates) | Fail via CI exit code → Slack / issue integration |
| CLI: normal plan (-detailed-exitcode only) | Code vs reality diff (changes that should be applied) | None (proposals only) | Detect diff and route to review |
| Terraform Cloud drift detection / scheduled runs | Workspace-level diff detection | None (detection-only evaluation run) | Visualized via notification channels and UI (check availability terms) |
Drift detection via scheduled plan (conceptual diagram)
Example: -refresh-only drift detection with GitHub Actions (schedule)
name: Drift Detection (Nightly)
on:
schedule:
- cron: '0 2 * * *' # UTC 02:00
workflow_dispatch:
jobs:
plan-refresh-only:
runs-on: ubuntu-latest
permissions:
id-token: write # Federate to cloud via OIDC
contents: read
steps:
- uses: actions/checkout@v4
- name: Set up Terraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.x
- name: Cloud credentials (example)
run: |
echo "Obtain short-lived credentials here (e.g. AWS STS / GCP Workload Identity)"
- name: Terraform init (remote backend)
run: terraform init -input=false
- name: Drift detection (refresh-only)
id: plan
run: |
set -e
terraform plan -refresh-only -detailed-exitcode -no-color || RC=$?
if [ "${RC}" = "2" ]; then
echo "DRIFT=true" >> $GITHUB_OUTPUT
exit 2
elif [ -n "${RC}" ] && [ "${RC}" != "0" ]; then
echo "Terraform error (RC=${RC})" >&2
exit ${RC}
fi
- name: Notify on drift
if: failure() && steps.plan.outputs.DRIFT == 'true'
run: |
echo "Drift detected. Post to Slack / create issue here."Workspace execution in Terraform Cloud/Enterprise gives you remote state management, locking, permission separation, and notification-channel integration in one place. Drift detection (refresh-based evaluation runs) plus schedule/policy integrations let you push detection and visualization onto the platform itself (check the official docs for availability and details).
Key design points: separate detection runs from change-applying runs, define clear alert operations on notification channels (email, webhook, etc.), and when needed, install guardrails via policy (Sentinel and similar) — for example, “block apply when manual changes are detected.”
Enforce least privilege and short lifetimes for credentials. Detection alone is mostly read-only, but some backend types require state writes and lock acquisition — so design permissions to match the backend's requirements.
To manage API rate limits and execution cost, control parallelism (-parallelism), batch-split workspaces, and prioritize stacks that change dynamically. To reduce noise, use lifecycle's ignore_changes appropriately and keep volatile elements (timestamps, random values) out of the plan.
Foundational questions cover plan's prior refresh, the meaning of -detailed-exitcode, and when to use -refresh-only vs normal plan. Backend locking, workspace design, Sentinel/policy integration, and least privilege for service accounts also come up often.
Scenario questions present requirements like “regularly detect diffs only, never modify infrastructure” and “notify on diff and route to manual review.” The model answer is to choose plan -refresh-only -detailed-exitcode and trigger notifications on exit code 2.
Pro
問題 1
In production, you want to detect “drift caused by external changes” every night without modifying any resources, and fail and notify the pipeline if a diff exists. Which CLI invocation is most appropriate?
正解: A
-refresh-only fits the requirement of detecting drift without modifying infrastructure. -detailed-exitcode returns exit code 2 when a diff exists, so the pipeline can treat it as failure. -refresh=false does not bring in external changes and causes false negatives — inappropriate. apply may make changes, which violates the requirement. Manual comparison of state pull output does not align with automation.
Drift showed up in refresh-only. Should I apply right away?
Identify the root cause first. If the manual change is legitimate, reflect it in code (or reconsider ignore_changes), then reconcile with a normal plan/apply. To align state with reality only, apply -refresh-only updates state alone — but always run reviews and approvals per your operational rules.
Are changes to data sources (data blocks) detected as drift?
data blocks are not managed resources — they are re-read on each evaluation. Drift detection primarily targets diffs in managed resources, so changes to data values themselves are not treated as drift. If a data change affects a managed resource, that diff will surface in the plan.
How do you reduce drift-detection noise in large environments?
Combine techniques: apply ignore_changes, isolate resources that change often, split schedules (high frequency for critical stacks, lower for others), tune parallelism and retries, and notify only with diff summaries (details go to logs). For spots that perpetually drift, revisit design, permissions, and automation.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
HCL Syntax: Terraform's Configuration Language (2026)
HCL2 fundamentals for Terraform — blocks, attributes, expres...
Terraform Authoring & Operations Pro: Complete Guide (2026)
Tactics for the Terraform Pro exam — module authoring, works...
Terraform Providers: Plugin Management Fundamentals (2026)
Provider mechanics — required_providers, versions, mirrors, ...
Terraform Resource Blocks: Declarative Infra Units (2026)
Resource block fundamentals — addresses, references, common ...
Terraform Data Sources: Read-Only External Data (2026)
Data source basics — declaration, refresh behavior, dependen...