Infrastructure as Code (IaC): The Complete Guide to Terraform, Ansible & GitOps

Infrastructure as Code (IaC) is the practice of managing and provisioning computing infrastructure through machine-readable configuration files rather than manual processes or interactive configuration tools. It is one of the most important engineering capabilities an organisation can develop — directly linked to faster deployments, fewer configuration errors, and the ability to rebuild entire environments in minutes rather than days.

This guide covers the core IaC concepts, the leading tools, how to structure IaC projects for scale, and the GitOps workflow that keeps production state under control.

Why Infrastructure as Code Matters

Before IaC, infrastructure changes were made by logging into servers, clicking through consoles, and updating wiki pages that quickly became stale. The result was “snowflake” servers: unique, hand-crafted machines that no one fully understood and that were impossible to reproduce. IaC solves this by making infrastructure changes:

  • Repeatable: Run the same code and get the same infrastructure every time
  • Auditable: Every change is a Git commit with an author, timestamp, and description
  • Reviewable: Infrastructure changes go through pull requests, exactly like application code
  • Testable: Validate infrastructure configuration before applying it to production
  • Recoverable: Rebuild from scratch in minutes if disaster strikes

Declarative vs Imperative IaC

Declarative IaC (Terraform, Pulumi, CloudFormation) lets you describe the desired end state. The tool figures out how to get there. You say “I want 3 web servers behind a load balancer” and Terraform creates, modifies, or destroys resources to match that declaration.

Imperative IaC (Ansible, shell scripts) describes the steps to take. You say “install nginx, then configure it, then start it.” Imperative tools are better for application configuration and procedural tasks; declarative tools are better for infrastructure provisioning.

In practice, most organisations use both: Terraform for provisioning cloud and on-premises infrastructure, Ansible for OS-level configuration and application deployment on top of that infrastructure.

Terraform: Declarative Infrastructure Provisioning

Terraform by HashiCorp is the most widely adopted IaC tool across cloud and on-premises environments. Its provider ecosystem covers over 3,000 integrations — from AWS and Azure to VMware vSphere, Cisco ACI, and bare-metal via IPMI.

Core Terraform Concepts

  • Provider: Plugin that communicates with an infrastructure API (AWS, Azure, VMware)
  • Resource: A single piece of infrastructure (an EC2 instance, a DNS record, a vSphere VM)
  • Module: A reusable group of resources with input variables and outputs
  • State: A JSON file that records the current known state of all managed resources
  • Backend: Where state is stored — locally for development, remotely (S3 + DynamoDB, Terraform Cloud) for teams

Terraform Project Structure for Hybrid Environments

infra/
├── modules/
│   ├── networking/       # VPCs, VLANs, peering
│   ├── compute/          # VMs, instance groups, ASGs
│   └── security/         # Security groups, IAM, NSGs
├── environments/
│   ├── on-prem/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── terraform.tfvars
│   ├── staging/
│   └── production/
└── global/               # DNS, certificates, shared state

State Management Best Practices

  • Always use a remote backend with state locking (S3 + DynamoDB for AWS, Azure Blob + lease for Azure)
  • Use separate state files per environment — never share state between staging and production
  • Enable state encryption at rest
  • Use terraform import to bring existing resources under management before making changes

Ansible: Configuration Management and Automation

Ansible is an agentless automation tool that uses SSH (Linux) or WinRM (Windows) to push configuration to servers. Its YAML-based playbook syntax is readable by operators and developers alike, making it ideal for cross-functional teams.

When to Use Ansible vs Terraform

TaskTerraformAnsible
Provision a VMPossible but suboptimal
Install packages
Configure network devicesLimited✓ (network modules)
Deploy application
Manage cloud API resourcesLimited
Procedural remediation

Ansible Best Practices

  • Use roles to package reusable configuration (Ansible Galaxy for community roles)
  • Store secrets in Ansible Vault, never in plaintext in your repository
  • Use dynamic inventory to query your cloud or on-prem CMDB rather than maintaining static host files
  • Write idempotent tasks — running a playbook twice should produce the same result as running it once

GitOps: Closing the Loop Between Code and Infrastructure

GitOps applies Git workflows to infrastructure operations. The core principles are:

  1. The entire system is described declaratively
  2. The desired state is versioned in Git
  3. Approved changes are automatically applied
  4. Software agents ensure correctness and alert on divergence

For Kubernetes workloads, ArgoCD or Flux implement GitOps by continuously reconciling cluster state with Git. For broader infrastructure, tools like Atlantis provide GitOps workflows for Terraform — automatically running plan on pull requests and apply on merge.

Testing Your Infrastructure Code

IaC should be tested like application code:

  • Static analysis: tfsec, Checkov, and Terrascan scan Terraform for misconfigurations (publicly exposed storage buckets, overly permissive IAM) before apply
  • Unit tests: Terratest (Go) spins up real infrastructure, runs assertions, then destroys it
  • Policy as Code: Open Policy Agent (OPA) and Sentinel enforce organisational policies (e.g., all resources must have cost centre tags) at plan time
  • Drift detection: Run terraform plan on a schedule and alert when the plan is non-empty

Frequently Asked Questions

What is the best Infrastructure as Code tool for beginners?

Terraform is the recommended starting point for most organisations. Its declarative syntax is intuitive, the documentation is excellent, and the provider ecosystem covers virtually every cloud and on-premises platform you are likely to encounter. Supplement with Ansible for OS-level configuration management.

Can Terraform manage on-premises infrastructure?

Yes. Terraform has providers for VMware vSphere, Nutanix, Proxmox, bare-metal via IPMI, and physical network devices from Cisco, Juniper, and Arista. This makes it possible to manage your entire hybrid infrastructure — on-premises and cloud — with a single tool and workflow.

How do you prevent Terraform state file conflicts in a team?

Use a remote backend with state locking. For AWS, store state in S3 and use DynamoDB for locking. For Azure, use Blob Storage with lease-based locking. Terraform Cloud and Terraform Enterprise provide managed state storage with built-in locking and access controls. Never use local state in a team environment.

What is the difference between Terraform and Ansible?

Terraform is declarative and excels at provisioning infrastructure (VMs, networks, cloud services). Ansible is imperative and excels at configuration management (installing software, deploying applications, running procedures). Most mature DevOps teams use both: Terraform to provision, Ansible to configure.

How do you manage secrets in Terraform?

Never hard-code secrets in Terraform files. Use a secrets manager: HashiCorp Vault (cross-platform), AWS Secrets Manager, or Azure Key Vault. The corresponding Terraform provider fetches secrets at plan/apply time. Mark sensitive outputs and variables with sensitive = true to prevent them appearing in plan output or state files.

What is drift detection in IaC?

Drift occurs when the actual state of your infrastructure diverges from what is declared in your IaC code — usually caused by manual changes made outside the IaC workflow. Drift detection runs terraform plan on a schedule and alerts when the plan output is non-empty, signalling that someone has made an out-of-band change that needs to be either codified or reverted.

Conclusion

Infrastructure as Code is not a tool choice — it is a discipline. The organisations that treat their infrastructure with the same engineering rigour as their application code consistently deploy faster, experience fewer incidents, and recover more quickly when things go wrong.

OpsNexus helps teams adopt and mature IaC practices across on-premises and cloud environments. Whether you’re starting from scratch or modernising an existing Terraform estate, get in touch to learn how we can help.

Similar Posts