Terraform Data Sources are the mechanism for reading existing resources, resolving their values at plan time, and safely passing them into other resource definitions. They are easy to confuse with creating new resources (resource blocks), but their roles are clearly different.
This article focuses on referencing existing resources and explains when to use Data Sources, the pitfalls to watch for, and design considerations with concrete examples. It assumes stable features from the official HashiCorp documentation and touches on Associate-level exam trends.
A Data Source is a read-only block for looking up and fetching existing resources in the cloud or elsewhere. At plan time it queries the provider's API, and the IDs and attributes it returns can be passed to other resources or modules. Data Sources themselves never create or modify anything.
A typical example is looking up an existing VPC and using its ID to create a new security group. This lets you add resources under management without touching the existing network.
Flow for referencing an existing resource (Data Source)
Example: Reference an existing VPC and create a new security group (AWS)
provider "aws" {
region = "ap-northeast-1"
}
data "aws_vpc" "main" {
filter {
name = "tag:Name"
values = ["prod-main-vpc"]
}
}
resource "aws_security_group" "web" {
name = "web-sg"
description = "Web tier SG"
vpc_id = data.aws_vpc.main.id
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
}
output "web_sg_id" {
value = aws_security_group.web.id
}There are three main approaches to handling existing resources in Terraform: dynamic lookup (data), importing into state (import), and passing a fixed value via a variable (vars). The best choice depends on the situation.
Choose based on operational stability, risk of incorrect references, and long-term maintainability. Sometimes passing a strict ID through a variable is safer than using data with vague filters.
| Approach | Primary Purpose | Typical Use Case | Caveats |
|---|---|---|---|
| Data Source | Look up and reference at plan time | Find VPC/subnet by tag, fetch latest AMI | Vague filters cause incorrect references or plan errors. Watch the result ordering too |
| terraform import | Bring existing resources into state management | Bring a manually created RDS under management, migrate an old SG | You must author the HCL definition yourself. Large diffs make the first plan heavy |
| Pass ID via variable | Explicitly specify a fixed ID | Receive a shared VPC ID from the platform team | Human error detection is delayed. Keeping up with ID changes tends to be manual |
Example: Receive a fixed ID via variable and apply it to a new resource
variable "vpc_id" {
type = string
}
resource "aws_security_group" "web" {
name = "web-sg"
vpc_id = var.vpc_id
}Data Sources are resolved at the plan stage. If a reference cannot be established, the plan itself fails before apply. This prevents creation based on incorrect assumptions.
Dependencies are implicitly derived from reference relationships. If a resource references the result of a data block, that resource is evaluated after the data block. Circular references cannot be built. Since data reads existing resources, avoid designs that "re-reference" a resource created in the same apply via data.
Example: Stabilizing data results when passing them to for_each
data "aws_subnets" "private" {
filter {
name = "tag:Tier"
values = ["private"]
}
}
locals {
private_ids_sorted = sort(data.aws_subnets.private.ids)
}
resource "aws_route_table_association" "private" {
for_each = toset(local.private_ids_sorted)
subnet_id = each.value
route_table_id = aws_route_table.private.id
}If existing infrastructure is managed by Terraform in another workspace or repository, you can reference its exposed outputs via data "terraform_remote_state". This is also a form of the "read existing" approach rather than creating new resources.
remote_state assumes that the upstream stack has been applied and exposes stable outputs. The key design tip is to avoid circular dependencies and keep references one-directional.
Example: Reference a VPC stack's outputs and add a tag to a subnet
data "terraform_remote_state" "vpc" {
backend = "s3"
config = {
bucket = "org-tfstate"
key = "network/prod-vpc.tfstate"
region = "ap-northeast-1"
}
}
resource "aws_ec2_tag" "extra" {
resource_id = data.terraform_remote_state.vpc.outputs.app_subnet_id
key = "ManagedBy"
value = "app-stack"
}Incorrect references via Data Sources can lead to serious incidents. Design filters to be unique and standardize tag naming conventions across the team. When multiple candidates might be returned, build in expected-count validation and stable ordering.
Some Data Sources return arrays in an order that depends on the API and is unstable. Using the sort function to explicitly fix the order reduces plan diffs on every run.
Example: Narrow down uniquely by tag, fix the order, and associate
data "aws_subnets" "app" {
filter {
name = "tag:App"
values = ["billing"]
}
filter {
name = "tag:Env"
values = ["prod"]
}
}
locals {
app_subnet_ids = sort(data.aws_subnets.app.ids)
}
resource "aws_security_group_rule" "egress_to_app" {
for_each = toset(local.app_subnet_ids)
security_group_id = aws_security_group.web.id
type = "egress"
from_port = 5432
to_port = 5432
protocol = "tcp"
cidr_blocks = [cidrsubnet("10.0.0.0/16", 8, index(local.app_subnet_ids, each.value))]
}Associate exams ask about design choices such as the role difference between data and resource, and when to use import or remote_state. Be able to explain these three points concisely: data does not create, it resolves at plan time, and references determine dependencies.
As a hands-on exercise, build a configuration that simply references the latest official AMI and outputs its ID. This builds intuition for plan-time resolution and filter authoring.
Practice: Reference and output the latest Amazon Linux 2 AMI
data "aws_ami" "al2" {
owners = ["137112412989"]
most_recent = true
filter {
name = "name"
values = ["amzn2-ami-hvm-*-x86_64-gp2"]
}
}
output "al2_ami_id" {
value = data.aws_ami.al2.id
}Associate
問題 1
You want to add a new security group to an existing VPC. The existing VPC is managed by another team, and its ID differs across environments. Which approach is the safest and most maintainable?
正解: A
To reference an existing VPC without breaking it and to follow per-environment differences, looking it up uniquely with a Data Source is appropriate. import takes over management responsibility, which contradicts the premise. Hardcoding or manual updates carry high error and maintenance costs.
Can a Data Source create resources?
No. Data Sources are read-only and simply fetch attributes of existing resources at plan time. Creation and modification are handled by resource blocks.
What happens if a Data Source reference cannot be found?
It fails at the plan stage and never reaches apply. Make filters unique and enforce tag naming conventions plus expected-count validation.
How should I choose between remote_state and a regular Data Source?
If the existing resource is already managed by Terraform and its outputs are exposed, remote_state is the more robust choice. Otherwise, use a provider-specific Data Source to query the API directly. Keep remote_state references one-directional to avoid circular dependencies.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
HCL Syntax: Terraform's Configuration Language (2026)
HCL2 fundamentals for Terraform — blocks, attributes, expres...
Terraform Authoring & Operations Pro: Complete Guide (2026)
Tactics for the Terraform Pro exam — module authoring, works...
Terraform Providers: Plugin Management Fundamentals (2026)
Provider mechanics — required_providers, versions, mirrors, ...
Terraform Resource Blocks: Declarative Infra Units (2026)
Resource block fundamentals — addresses, references, common ...
Terraform Data Sources: Read-Only External Data (2026)
Data source basics — declaration, refresh behavior, dependen...