Terraform Data Sources: Read-Only External Data (2026)

Terraform Data Sources are the mechanism for reading existing resources, resolving their values at plan time, and safely passing them into other resource definitions. They are easy to confuse with creating new resources (resource blocks), but their roles are clearly different.

This article focuses on referencing existing resources and explains when to use Data Sources, the pitfalls to watch for, and design considerations with concrete examples. It assumes stable features from the official HashiCorp documentation and touches on Associate-level exam trends.

Data Source Basics: Role and Big Picture

A Data Source is a read-only block for looking up and fetching existing resources in the cloud or elsewhere. At plan time it queries the provider's API, and the IDs and attributes it returns can be passed to other resources or modules. Data Sources themselves never create or modify anything.

A typical example is looking up an existing VPC and using its ID to create a new security group. This lets you add resources under management without touching the existing network.

data is read-only and has no side effects at apply time
Resolved at plan time; if the reference is not found, the plan fails
References form implicit dependencies, and ordering is controlled by the dependency graph

Flow for referencing an existing resource (Data Source)

Example: Reference an existing VPC and create a new security group (AWS)

provider "aws" {
  region = "ap-northeast-1"
}

data "aws_vpc" "main" {
  filter {
    name   = "tag:Name"
    values = ["prod-main-vpc"]
  }
}

resource "aws_security_group" "web" {
  name        = "web-sg"
  description = "Web tier SG"
  vpc_id      = data.aws_vpc.main.id

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

output "web_sg_id" {
  value = aws_security_group.web.id
}

Data Source vs Import vs Variable: Selection Criteria

There are three main approaches to handling existing resources in Terraform: dynamic lookup (data), importing into state (import), and passing a fixed value via a variable (vars). The best choice depends on the situation.

Choose based on operational stability, risk of incorrect references, and long-term maintainability. Sometimes passing a strict ID through a variable is safer than using data with vague filters.

If identifiers change frequently and are not stable, look them up with data
Consider import if you plan to migrate to Terraform management in the future
If the strict ID is fixed and never changes, explicit specification via a variable is also valid

Approach	Primary Purpose	Typical Use Case	Caveats
Data Source	Look up and reference at plan time	Find VPC/subnet by tag, fetch latest AMI	Vague filters cause incorrect references or plan errors. Watch the result ordering too
terraform import	Bring existing resources into state management	Bring a manually created RDS under management, migrate an old SG	You must author the HCL definition yourself. Large diffs make the first plan heavy
Pass ID via variable	Explicitly specify a fixed ID	Receive a shared VPC ID from the platform team	Human error detection is delayed. Keeping up with ID changes tends to be manual

Example: Receive a fixed ID via variable and apply it to a new resource

variable "vpc_id" {
  type = string
}

resource "aws_security_group" "web" {
  name   = "web-sg"
  vpc_id = var.vpc_id
}

Evaluation Order and Dependencies: What Happens at Plan Time

Data Sources are resolved at the plan stage. If a reference cannot be established, the plan itself fails before apply. This prevents creation based on incorrect assumptions.

Dependencies are implicitly derived from reference relationships. If a resource references the result of a data block, that resource is evaluated after the data block. Circular references cannot be built. Since data reads existing resources, avoid designs that "re-reference" a resource created in the same apply via data.

Failure to resolve data means plan failure, which helps with early detection
References create implicit dependencies, so excessive explicit depends_on declarations are unnecessary
Consider stabilizing (sorting) the order of data results

Example: Stabilizing data results when passing them to for_each

data "aws_subnets" "private" {
  filter {
    name   = "tag:Tier"
    values = ["private"]
  }
}

locals {
  private_ids_sorted = sort(data.aws_subnets.private.ids)
}

resource "aws_route_table_association" "private" {
  for_each       = toset(local.private_ids_sorted)
  subnet_id      = each.value
  route_table_id = aws_route_table.private.id
}

Reference Other Stack Outputs with terraform_remote_state

If existing infrastructure is managed by Terraform in another workspace or repository, you can reference its exposed outputs via data "terraform_remote_state". This is also a form of the "read existing" approach rather than creating new resources.

remote_state assumes that the upstream stack has been applied and exposes stable outputs. The key design tip is to avoid circular dependencies and keep references one-directional.

Only the upstream stack's outputs are accessible
Use a stable backend such as S3 or Terraform Cloud
Naming and responsibility separation that avoids circular references are critical

Example: Reference a VPC stack's outputs and add a tag to a subnet

data "terraform_remote_state" "vpc" {
  backend = "s3"
  config = {
    bucket = "org-tfstate"
    key    = "network/prod-vpc.tfstate"
    region = "ap-northeast-1"
  }
}

resource "aws_ec2_tag" "extra" {
  resource_id = data.terraform_remote_state.vpc.outputs.app_subnet_id
  key         = "ManagedBy"
  value       = "app-stack"
}

Robust Selector Design: Practical Insights on Tags, Filters, and Ordering

Incorrect references via Data Sources can lead to serious incidents. Design filters to be unique and standardize tag naming conventions across the team. When multiple candidates might be returned, build in expected-count validation and stable ordering.

Some Data Sources return arrays in an order that depends on the API and is unstable. Using the sort function to explicitly fix the order reduces plan diffs on every run.

Make tags unique by including environment, role, and owner
If the expected count is 1, guarantee it with documentation and tests
Handle lists defensively with sort and contains

Example: Narrow down uniquely by tag, fix the order, and associate

data "aws_subnets" "app" {
  filter {
    name   = "tag:App"
    values = ["billing"]
  }
  filter {
    name   = "tag:Env"
    values = ["prod"]
  }
}

locals {
  app_subnet_ids = sort(data.aws_subnets.app.ids)
}

resource "aws_security_group_rule" "egress_to_app" {
  for_each          = toset(local.app_subnet_ids)
  security_group_id = aws_security_group.web.id
  type              = "egress"
  from_port         = 5432
  to_port           = 5432
  protocol          = "tcp"
  cidr_blocks       = [cidrsubnet("10.0.0.0/16", 8, index(local.app_subnet_ids, each.value))]
}

Associate Exam Perspective and Mini Exercise

Associate exams ask about design choices such as the role difference between data and resource, and when to use import or remote_state. Be able to explain these three points concisely: data does not create, it resolves at plan time, and references determine dependencies.

As a hands-on exercise, build a configuration that simply references the latest official AMI and outputs its ID. This builds intuition for plan-time resolution and filter authoring.

data references existing resources; resource creates/modifies
An unresolved data causes plan failure
remote_state is used to reference outputs from other stacks

Practice: Reference and output the latest Amazon Linux 2 AMI

data "aws_ami" "al2" {
  owners      = ["137112412989"]
  most_recent = true
  filter {
    name   = "name"
    values = ["amzn2-ami-hvm-*-x86_64-gp2"]
  }
}

output "al2_ami_id" {
  value = data.aws_ami.al2.id
}

Check with a Question

Associate

問題 1

You want to add a new security group to an existing VPC. The existing VPC is managed by another team, and its ID differs across environments. Which approach is the safest and most maintainable?

Use a Data Source that uniquely identifies the VPC by tag to fetch its ID, then create the security group using that ID
Use terraform import to bring the VPC into your own state and have your team manage the VPC from then on
Pass the VPC's ID via an environment variable and hardcode it inside the module
Manually check the VPC's ID before apply and rewrite the HCL each time

正解: A

To reference an existing VPC without breaking it and to follow per-environment differences, looking it up uniquely with a Data Source is appropriate. import takes over management responsibility, which contradicts the premise. Hardcoding or manual updates carry high error and maintenance costs.

Frequently Asked Questions

Can a Data Source create resources?

No. Data Sources are read-only and simply fetch attributes of existing resources at plan time. Creation and modification are handled by resource blocks.

What happens if a Data Source reference cannot be found?

It fails at the plan stage and never reaches apply. Make filters unique and enforce tag naming conventions plus expected-count validation.

How should I choose between remote_state and a regular Data Source?

If the existing resource is already managed by Terraform and its outputs are exposed, remote_state is the more robust choice. Otherwise, use a provider-specific Data Source to query the API directly. Keep remote_state references one-directional to avoid circular dependencies.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる

Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.

Terraform Data Sources Explained: Reference Existing Resources Without Breaking Them