Terraform Authoring & Operations Pro: Complete Guide (2026)

HashiCorp's publicly available certification is centered on Associate, but this article assumes an advanced (Pro-equivalent) level and organizes study perspectives around the hard parts of real-world operations.

Beyond exam prep, we explain things in terms of standard production patterns such as Terraform Cloud/Enterprise and the S3 + DynamoDB backend.

Core Terraform Domains Tested at the Advanced Level

At the advanced level, you are expected to do more than write HCL. You are tested on module publication quality, State operations (locking, isolation, migration), change safety (create_before_destroy, replace_triggered_by, moved), policy enforcement, and CI-driven automation and auditability. You should also understand the differences between Terraform Open Source and Terraform Cloud/Enterprise (TFC/E).

Questions tend to focus on decisions that assume scale and change — multi-environment, multi-workspace, remote State, team permissions, enforcement of tags and conventions, module versioning, provider upgrades — rather than simple single-workspace setups.

Module design and release management (variables/outputs, validation, pre/post conditions, semantic versioning)
Backend selection and State operations (locking, isolation, moved/state mv/import, handling of sensitive data)
Diff control (lifecycle, ignore_changes, replace triggers, zero-downtime strategies)
Governance (Sentinel/OPA, workspace permissions, variable sets, provider constraints)
CI/testing/audit (JSON output of validate/plan, static analysis, approval flows, artifact retention)

Domain	Associate depth	Advanced-level depth	Production perspective
Module design	Basic variables/outputs	Publication quality, conditions, versioning, and breaking-change handling	Ship design changes while maintaining backward compatibility and docs
State/backend	Understanding of init/refresh	Locking, isolation, and migration strategy for S3+DynamoDB / TFC	Recovery procedures, drift response, and key design
Change management	Basics of plan/apply	Combining create_before_destroy, replace_triggered_by, and moved	Aim for zero downtime while balancing cost and speed
Governance	Basic understanding of workspaces	Sentinel/OPA, permission design, variable sets, audit logs	Enforcement of org standards with an exception approval process
Automation/audit	Running validate/plan	JSON plan retention, signing, approval gates, reproducibility	Evidence trail and ease of rollback

Advanced-level Terraform responsibility map

Minimal root configuration skeleton (version / provider constraints)

terraform {
  required_version = ">= 1.4, < 2.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = var.region
}

module "network" {
  source  = "appcorp/network/vpc"
  version = "~> 2.4"
  cidr_block = var.vpc_cidr
}

Practical Patterns for Module Design

At the advanced level, modules need to be more than just usable — they must be safe to upgrade. The key points are variable validation, pre/post conditions, sensible defaults, backward compatibility of names and outputs, and clear versioning conventions. Breaking changes go in major releases, combined with moved blocks to migrate addresses and minimize impact.

Provide a README, examples, and a version compatibility matrix. If you publish to a registry, keep the usage examples runnable. Even for internal-only modules, always ship a CHANGELOG and an upgrade guide.

Clarify input boundaries (nullable, type, validation)
Machine-verify assumptions and output health with precondition/postcondition
Do not casually change output structure. Add new fields; deprecate before removing existing ones
Declaratively migrate address changes with moved; apply in stages aligned with versions
Centralize default tags and common labels in locals to avoid duplication

Module snippet with conditions and validation

variable "subnet_cidrs" {
  type = list(string)
  validation {
    condition     = length(var.subnet_cidrs) >= 2
    error_message = "At least two subnet CIDRs are required."
  }
}

locals {
  default_tags = merge(var.default_tags, {
    managed-by = "terraform"
    module     = "appcorp/network"
  })
}

resource "aws_subnet" "this" {
  for_each          = toset(var.subnet_cidrs)
  vpc_id            = var.vpc_id
  cidr_block        = each.value
  tags              = local.default_tags
  lifecycle {
    precondition {
      condition     = can(cidrnetmask(each.value))
      error_message = "Invalid CIDR format."
    }
  }
}

moved {
  from = aws_subnet.old
  to   = aws_subnet.this
}

output "subnet_ids" {
  description = "List of created subnet IDs"
  value       = values(aws_subnet.this)[*].id
}

State Operations and Backend Design

State is the source of truth. At the advanced level, you design locking, permissions, and a key strategy (separation by environment, region, and system) on the remote backend, and you standardize recovery procedures for incidents. S3 + DynamoDB is the classic choice for on-prem and multi-account setups, while Terraform Cloud/Enterprise integrates locking, run management, policy, and audit.

For migration and maintenance, choose appropriately between terraform state mv/import and module-level moved. For drift, do not jump to -state rm or manual fixes — identify the root cause, and if needed adjust with ignore_changes or by understanding how the provider computes diffs.

Make keys unique with a combination of workspace/env/region/system
Always enable S3 versioning and MFA delete, combined with DynamoDB locking
On TFC/E, suppress contention with the Run Queue, locking, and plan approval flows
Do not rely solely on storage versioning for backups — take planned exports/snapshots
Assume sensitive data may remain in State, and minimize how outputs and attributes are exposed

Representative backend configuration and operational commands

# S3 + DynamoDB locking
terraform {
  backend "s3" {
    bucket         = "tfstate-prod"
    key            = "network/prod/ap-northeast-1/terraform.tfstate"
    region         = "ap-northeast-1"
    dynamodb_table = "tfstate-lock"
    encrypt        = true
  }
}

# Terraform Cloud (remote) backend
terraform {
  cloud {
    organization = "appcorp"
    workspaces {
      name = "network-prod"
    }
  }
}

# Operational snippets
# Bind an existing resource to declaration (import)
terraform import aws_subnet.this["10.0.1.0/24"] subnet-abc123

# Address migration (state mv)
terraform state mv aws_subnet.old["10.0.1.0/24"] aws_subnet.this["10.0.1.0/24"]

# JSON-ify diffs (for audit)
terraform show -json plan.out > plan.json

Diff Control and Basic Zero-Downtime Strategy

Even when a change forces a replacement, you can reduce downtime and risk. Use lifecycle's create_before_destroy, suppress diffs on externally controlled values with ignore_changes, and induce replacement at the intended moment with replace_triggered_by. Provider-specific blue-green and rolling updates are safest when wrapped behind a module abstraction.

Keep -target reserved for debugging and staged rollouts — do not make it part of normal operations. Split risk across multi-step changes by versioning modules and leveraging moved blocks and output compatibility.

Control replacement order with create_before_destroy (watch dependencies)
Use ignore_changes to skip externally controlled items (manual tags, transient attributes)
Make safer-side re-creation explicit with replace_triggered_by
Split multi-step plans into small steps and keep plan evidence at each step

HCL example for replacement order and diff suppression

resource "aws_launch_template" "app" {
  name_prefix   = "app-"
  image_id      = var.ami
  instance_type = var.instance_type
  user_data     = var.user_data

  lifecycle {
    create_before_destroy = true
    ignore_changes = [
      user_data,               # Suppress drift when swapped externally each time
      tags["temporary"]
    ]
    replace_triggered_by = [
      var.force_recreate_token # Force safer-side replacement with an arbitrary token
    ]
  }

  timeouts {
    create = "20m"
    delete = "20m"
  }
}

Policy and Governance (Sentinel/OPA and Permission Design)

In Terraform Cloud/Enterprise, Sentinel lets you enforce organization-wide policies on plans. Codify rules such as required tags, cost ceilings, and allow-listed regions or providers, and control scope via workspaces and policy sets. In Open Source environments, integrate OPA/Rego and Conftest into CI to run equivalent checks.

Apply least-privilege as a principle, and when integrating with VCS carefully separate trigger permissions from State-read permissions. Use variable sets and Sensitive variables to inject credentials outside of code, and ensure operational traceability through audit logs.

Allow-list providers/versions (enforce required_providers)
Mandate tag/label standardization through policy
Reuse Sensitive variables and per-environment variable sets to prevent misplacement
Roll out policies in stages (advisory → soft-mandatory → hard-mandatory)

Simple Sentinel policy example (required tags)

import "tfplan/v2" as tfplan

main = rule {
  all tfplan.resources as r {
    r.mode is "managed" implies has_required_tags(r)
  }
}

has_required_tags = func(r) {
  all r.applied.resources as inst {
    inst.change.after.tags is not null and
    inst.change.after.tags["owner"] is not null and
    inst.change.after.tags["env"] in ["dev", "stg", "prod"]
  }
}

CI / Testing / Troubleshooting Implementation Points

Separate init/validate/plan in CI, save the plan as a binary artifact, and use it only for apply after review. Run static analysis (tflint and friends) and policy checks (OPA/Checkov) early, and reserve heavy plans for the point where changes are finalized. Combine with VCS protection rules to ensure reproducibility and evidence.

For incident investigation, start with machine-readable diffs via plan/show -json, TF_LOG and provider debug flags, and adjustments to parallelism or API rate limits. For import, where the recent import block is available, declare it in code so it is built into the plan and the procedure becomes visible.

Reuse plan.out to prevent plan-vs-apply substitution
Persist JSON output for approval and audit
Run static analysis and policy checks before plan
Use -parallelism and retries to respect API throttling

Basic GitHub Actions pipeline example

name: terraform-ci
on:
  pull_request:
    paths: ["infra/**"]

jobs:
  plan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.6.6
      - run: terraform -chdir=infra init -input=false
      - run: terraform -chdir=infra validate
      - run: terraform -chdir=infra plan -input=false -out=plan.out
      - run: terraform -chdir=infra show -json plan.out > plan.json
      - uses: actions/upload-artifact@v4
        with:
          name: tf-plan
          path: infra/plan.out
  apply:
    if: github.ref == 'refs/heads/main' && github.event_name == 'workflow_dispatch'
    needs: plan
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
      - uses: actions/download-artifact@v4
        with:
          name: tf-plan
          path: infra
      - run: terraform -chdir=infra init -input=false
      - run: terraform -chdir=infra apply -input=false "plan.out"

Check with a Sample Question

Pro

問題 1

You refactored an internal module used across multiple workspaces (dev/stg/prod) and the resource address changed from aws_subnet.old to aws_subnet.this. You want to migrate safely across all environments without re-creation or downtime. Which is the most appropriate action?

Add a moved block to the module, release a new version, and run plan/apply in each workspace to reflect the address migration
Remove the old address with terraform state rm in each workspace and let apply create it anew
Use apply's -target to apply only the old→new resources in stages
Create new workspaces, rebuild State, and do a clean install

正解: A

The moved block declaratively migrates resource address changes and avoids re-creation. Distributed as a new module version, it automatically updates State during plan/apply in each workspace. state rm is destructive and hard to recover from, -target is not a permanent migration tool, and creating new workspaces is unnecessary and risky.

Frequently Asked Questions

Should I use Sentinel or OPA?

If you use Terraform Cloud/Enterprise, Sentinel is the natural choice for centrally managing runtime policies. For Open Source-centric CI, adopt OPA/Rego (Conftest) and run it as a static check before plan. You can also combine both, splitting roles like OPA for development and Sentinel for production.

Can I get safe state locking without Terraform Cloud?

Yes. The standard pattern is to combine an S3 backend with DynamoDB locking. Enable versioning and encryption on S3, and configure appropriate throughput and alerts on the DynamoDB table. To avoid contention, serialize CI to a single job and set operational rules that prevent overlap with manual runs.

How tightly should I pin module and provider versions?

For modules, pin to a minor range with ~> for compatibility, and absorb breaking changes via major updates. For providers, a pragmatic approach is to use ~> range constraints to avoid incidents, or pin exactly in the short term, validate, then gradually loosen. Be diligent about reading CHANGELOGs and applying changes first in a validation environment.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる

Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.

Terraform Authoring & Ops Pro: Advanced Exam Scope and Strategy