How to use Terraform Data sources?



Terraform data sources can be beneficial if you want to retrieve or fetch the data from the cloud service providers such as AWS, AZURE, and GCP. Most of the time when we use Terraform along with AWS/AZURE/GCP then we always send data in terms of instructions or configuration.

But what if you want to get the information(arn, tags, owner_id, etc.) back from the cloud service provider AWS/AZURE/GCP?

Answer - We need to use the data sources to get the resource information back.

So Terraform Data Sources are a kind of an API that fetches the data/information from the resources running under the cloud infra and sending it back to terraform configuration for further use.

In this blog, we will look at the example in which we are going to create an aws_instance resource and then create a data source to fetch some of the information associated with the aws_instance.

Terraform data source flow

Table of Content

  1. Create an aws_instance
  2. Define a data source
  3. Create Output variable for data source
  4. Apply the final terraform configuration along with data source and output values
  5. Fetching only specific attribute using data source



1. Create an aws_instance

The motive of this exercise is to create an aws_instance and then create a data source to fetch all the possible Data Source: aws_instance attributes.

Let's first write the terraform configuration for starting a t2.mirco aws_instance.

(*Note- Replace the access_key and secret_key with your AWS account. Click here to know how to generate the access_key and secret_key )

 1provider "aws" {
 2   region     = "eu-central-1"
 3   access_key = "AKIATQ37NXB2JMXVGYPG"
 4   secret_key = "ockvEN1DzYynDuKIh56BVQv/tMqmzvKnYB8FttSp"
 5}
 6
 7resource "aws_instance" "ec2_example" {
 8
 9   ami           = "ami-0767046d1677be5a0"
10   instance_type =  "t2.micro"
11
12   tags = {
13           Name = "Terraform EC2"
14   }
15}


2. Define a data source

Now we have created our aws_instance in Step 1, let's add the data source to the existing terraform configuration.

Here is the data source configuration for fetching all the information of aws_instance -

 1data "aws_instance" "myawsinstance" {
 2    filter {
 3      name = "tag:Name"
 4      values = ["Terraform EC2"]
 5    }
 6
 7    depends_on = [
 8      "aws_instance.ec2_example"
 9    ]
10} 

Key points to pay attention for -

  1. filter: Although we have created only one instance but still we have used filter because in a production-like environment you might have multiple aws_instance running, so you need to filter the instance anyhow. And since we have tagged our aws_instance with the name Terraform EC2 so we are going to use the same name inside the filter also.
  2. depends_on: The second important parameter is depends_on because data source does not know by its own which resource it belongs to, so we are going to add the depends_on parameter.


3. Create Output variable for data source

So far in Step 1 and Step 2 we have created the aws_instance and data source, now let's create an output value so that we can see all the information fetched or retrieved by the data source.

Here is the terraform configuration for the output value -

1output "fetched_info_from_aws" {
2  value = data.aws_instance.myawsinstance
3}

Key points to pay attention for -

  1. We have linked the output value to the data source which we have created in Step 2.
  2. To link the output value we are going to use the data source name .i.e. - data.aws_instance.myawsinstance


4. Apply the final terraform configuration along with data source and output values

Alright now I am assuming you have gone through all the 3 steps(Step 1,Step 2, and Step 3), so here is our final terraform configuration including aws_instance, data source, and output values

 1provider "aws" {
 2    region     = "eu-central-1"
 3    access_key = "AKIATQ37NXB2JMXVGYPG"
 4    secret_key = "ockvEN1DzYynDuKIh56BVQv/tMqmzvKnYB8FttSp"
 5}
 6
 7resource "aws_instance" "ec2_example" {
 8
 9    ami           = "ami-0767046d1677be5a0"
10    instance_type =  "t2.micro"
11
12    tags = {
13      Name = "Terraform EC2"
14    }
15}
16
17data "aws_instance" "myawsinstance" {
18    filter {
19        name = "tag:Name"
20        values = ["Terraform EC2"]
21    }
22
23    depends_on = [
24      "aws_instance.ec2_example"
25    ]
26}
27
28output "fetched_info_from_aws" {
29  value = data.aws_instance.myawsinstance.public_ip
30}


You can simply run the following terraform command to create your aws_instance -

1terraform init
1terraform plan
1terraform apply

Here is the output after applying to terraform configuration -

 1Outputs:
 2
 3fetched_info_from_aws = {
 4  "ami" = "ami-0767046d1677be5a0"
 5  "arn" = "arn:aws:ec2:eu-central-1:242396018804:instance/i-0eda1c6a59790eb7d"
 6  "associate_public_ip_address" = true
 7  "availability_zone" = "eu-central-1c"
 8  "credit_specification" = tolist([
 9    {
10      "cpu_credits" = "standard"
11    },
12  ])
13  "disable_api_termination" = false
14  "ebs_block_device" = toset([])
15  "ebs_optimized" = false
16  "enclave_options" = tolist([
17    {
18      "enabled" = false
19    },
20  ])
21  "ephemeral_block_device" = tolist([])
22  "filter" = toset([
23    {
24      "name" = "tag:Name"
25      "values" = tolist([
26        "Terraform EC2",
27      ])
28    },
29  ])
30  "get_password_data" = false
31  "get_user_data" = false
32  "host_id" = tostring(null)
33  "iam_instance_profile" = ""
34  "id" = "i-0eda1c6a59790eb7d"
35  "instance_id" = tostring(null)
36  "instance_state" = "running"
37  "instance_tags" = tomap(null) /* of string */
38  "instance_type" = "t2.micro"
39  "key_name" = ""
40  "metadata_options" = tolist([
41    {
42      "http_endpoint" = "enabled"
43      "http_put_response_hop_limit" = 1
44      "http_tokens" = "optional"
45    },
46  ])
47  "monitoring" = false
48  "network_interface_id" = "eni-0ffc9d62eafcafcbc"
49  "outpost_arn" = ""
50  "password_data" = tostring(null)
51  "placement_group" = ""
52  "private_dns" = "ip-172-31-9-122.eu-central-1.compute.internal"
53  "private_ip" = "172.31.9.122"
54  "public_dns" = "ec2-3-122-249-219.eu-central-1.compute.amazonaws.com"
55  "public_ip" = "3.122.249.219"
56  "root_block_device" = toset([
57    {
58      "delete_on_termination" = true
59      "device_name" = "/dev/sda1"
60      "encrypted" = false
61      "iops" = 100
62      "kms_key_id" = ""
63      "tags" = tomap({})
64      "throughput" = 0
65      "volume_id" = "vol-0fce01580b0175da8"
66      "volume_size" = 8
67      "volume_type" = "gp2"
68    },
69  ])
70  "secondary_private_ips" = toset([])
71  "security_groups" = toset([
72    "default",
73  ])
74  "source_dest_check" = true
75  "subnet_id" = "subnet-2183316d"
76  "tags" = tomap({
77    "Name" = "Terraform EC2"
78  })
79  "tenancy" = "default"
80  "user_data" = tostring(null)
81  "user_data_base64" = tostring(null)
82  "vpc_security_group_ids" = toset([
83    "sg-272bd157",
84  ])
85} 

Here is the screenshot from aws -

  1. aws_instance is up and running -

Terraform aws_instance up and running with data source

  1. aws_instance details (you can verify the output from step 4) -

Terraform data source with aws_instance


5. Fetching only specific attribute using data source

Now let's one step more further and instead of fetching all the attributes of the aws_instance let's only fetch the public_ip.

You only need to update the output value configuration -

1output "fetched_info_from_aws" {
2  value = data.aws_instance.myawsinstance.public_ip
3}

Here is the public_ip information fetched by the data source -

Terraform data source to get the public_ip of the aws_instance


Read More - Terragrunt -

  1. How to use Terragrunt?

Posts in this Series