Design Document: MM AWS Network Infrastructure
Overview
This document describes the technical design for the Mica Mirai (MM) AWS Network Infrastructure project. The scope covers three phases delivered as Terraform code, version-controlled in GitHub, and deployed via GitHub Actions CI/CD:
- Phase 0 — Bootstrap: One-time manual provisioning of the Terraform remote state backend (S3 bucket + DynamoDB lock table) and the IAM OIDC provider + role for GitHub Actions.
- Phase 1 — Network: VPC, subnets, Internet Gateway, NAT egress resources (NAT Instance in
dev, NAT Gateways inprod), route tables, VPC Flow Logs, and CloudTrail, deployed per environment (devandprod). - Phase 2 — CI/CD: GitHub Actions workflow that automates
terraform planon pull requests andterraform applyon merges tomain, with a mandatory approval gate beforeprod.
The dev environment uses a single NAT Instance instead of managed NAT Gateways to reduce cost. The prod environment uses two NAT Gateways (one per AZ) for high availability. The NAT Instance is the only EC2 compute resource provisioned — it exists solely for NAT egress and is not a workload instance. The architecture is EKS-ready: subnets carry the Kubernetes load-balancer discovery tags so a future EKS cluster can identify them automatically.
Goals
- All infrastructure defined as code; zero manual console changes after bootstrap.
- Two fully isolated environments (
dev,prod) sharing no AWS resources. - Security baseline enforced from day one: private subnets have no IGW route, all state and audit buckets are encrypted and public-access-blocked, no static IAM credentials.
- Consistent naming and tagging across every resource via a single
local.common_tagslocals block. - Cost-optimised
devenvironment using a NAT Instance instead of managed NAT Gateways. - EKS-ready subnet topology for future node groups and load balancers.
Non-Goals
- Compute workload resources (EC2 application servers, EKS, Lambda, RDS). Note: a NAT Instance EC2 is provisioned in
devfor NAT egress only. - Application-level security groups (beyond the VPC default).
- DNS zones or Route 53 configuration.
- Cost optimization (Reserved Instances, Savings Plans).
Architecture
High-Level Topology
dev environment — single NAT Instance for cost saving:
┌─────────────────────────────────────────────────────────────────────┐
│ AWS Account (us-east-1) dev VPC: 10.1.0.0/16 │
│ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ ┌─────────────────────┐ ┌─────────────────────────────┐ │ │
│ │ │ AZ: us-east-1a │ │ AZ: us-east-1b │ │ │
│ │ │ │ │ │ │ │
│ │ │ Public Subnet /24 │ │ Public Subnet /24 │ │ │
│ │ │ ┌───────────────┐ │ │ (no NAT resource) │ │ │
│ │ │ │ NAT Instance │ │ │ │ │ │
│ │ │ │ (EC2 + EIP) │ │ │ │ │ │
│ │ │ └───────────────┘ │ │ │ │ │
│ │ │ │ │ │ │ │
│ │ │ Private Subnet /24 │ │ Private Subnet /24 │ │ │
│ │ │ (shared RT → NAT) │ │ (shared RT → NAT) │ │ │
│ │ └─────────────────────┘ └─────────────────────────────┘ │ │
│ │ │ │
│ │ Internet Gateway (1) │ │
│ └──────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
prod environment — two NAT Gateways for high availability:
┌─────────────────────────────────────────────────────────────────────┐
│ AWS Account (us-east-1) prod VPC: 10.2.0.0/16 │
│ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ ┌─────────────────────┐ ┌─────────────────────────────┐ │ │
│ │ │ AZ: us-east-1a │ │ AZ: us-east-1b │ │ │
│ │ │ │ │ │ │ │
│ │ │ Public Subnet /24 │ │ Public Subnet /24 │ │ │
│ │ │ ┌───────────────┐ │ │ ┌───────────────────────┐ │ │ │
│ │ │ │ NAT-GW (AZ-a)│ │ │ │ NAT-GW (AZ-b) │ │ │ │
│ │ │ │ EIP │ │ │ │ EIP │ │ │ │
│ │ │ └───────────────┘ │ │ └───────────────────────┘ │ │ │
│ │ │ │ │ │ │ │
│ │ │ Private Subnet /24 │ │ Private Subnet /24 │ │ │
│ │ │ (RT-a → NAT-GW-a) │ │ (RT-b → NAT-GW-b) │ │ │
│ │ └─────────────────────┘ └─────────────────────────────┘ │ │
│ │ │ │
│ │ Internet Gateway (1) │ │
│ └──────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
Deployment Phases
Phase 0 (manual, once)
└─ terraform/bootstrap/
├─ S3 state bucket
├─ DynamoDB lock table
├─ IAM OIDC provider
└─ IAM role MM-github-actions-role
Phase 1 (automated via CI/CD, per env)
└─ terraform/network/
├─ VPC
├─ Public subnets (x2)
├─ Private subnets (x2)
├─ Internet Gateway
├─ [dev] 1 EIP + 1 NAT Instance (EC2) + 1 shared private RT
├─ [prod] 2 EIPs + 2 NAT Gateways + 2 per-AZ private RTs
├─ 1 public route table + 4 route table associations
├─ VPC Flow Logs + CloudWatch log group
├─ CloudTrail trail
└─ CloudTrail S3 bucket
Phase 2 (GitHub Actions workflow)
└─ .github/workflows/terraform.yaml
├─ PR: plan + comment
├─ merge to main: apply dev (auto)
└─ prod gate: manual approval → apply prod
CIDR Allocation
| Environment | VPC CIDR | Public Subnet AZ-a | Public Subnet AZ-b | Private Subnet AZ-a | Private Subnet AZ-b |
|---|---|---|---|---|---|
| dev | 10.1.0.0/16 | 10.1.0.0/24 | 10.1.1.0/24 | 10.1.10.0/24 | 10.1.11.0/24 |
| prod | 10.2.0.0/16 | 10.2.0.0/24 | 10.2.1.0/24 | 10.2.10.0/24 | 10.2.11.0/24 |
Public subnets occupy the .0.x and .1.x blocks; private subnets occupy .10.x and .11.x, leaving ample room for future expansion (e.g., intra-cluster subnets at .20.x).
Components and Interfaces
Phase 0: Bootstrap Module (terraform/bootstrap/)
Provisioned once by a platform engineer running terraform apply locally before the CI/CD pipeline exists.
| Resource | Terraform Resource Type | Purpose |
|---|---|---|
| S3 State Bucket | aws_s3_bucket + sub-resources |
Stores all Terraform remote state files |
| DynamoDB Lock Table | aws_dynamodb_table |
Prevents concurrent Terraform runs |
| IAM OIDC Provider | aws_iam_openid_connect_provider |
Federates GitHub Actions to AWS |
| IAM Role (GHA) | aws_iam_role |
Assumed by GitHub Actions via OIDC |
| IAM Role Policy | aws_iam_role_policy |
Grants Terraform permissions to the GHA role |
S3 State Bucket sub-resources:
aws_s3_bucket_versioning— enabledaws_s3_bucket_server_side_encryption_configuration— SSE-S3 (AES256)aws_s3_bucket_public_access_block— all four flagstrueaws_s3_bucket_logging— access logs delivered to alogs/prefix in the same bucket (or a dedicated access-log bucket)
DynamoDB Lock Table:
- Partition key:
LockID(String) - Billing mode:
PAY_PER_REQUEST - SSE: enabled (AWS-managed key)
IAM OIDC Provider:
- URL:
https://token.actions.githubusercontent.com - Audience:
sts.amazonaws.com - Thumbprint list: current GitHub Actions OIDC thumbprint
IAM Role MM-github-actions-role:
- Trust policy condition:
token.actions.githubusercontent.com:submust matchrepo:<org>/<repo>:ref:refs/heads/main - Permissions: scoped to Terraform operations (S3 state read/write, DynamoDB lock, EC2/VPC/IAM/CloudTrail/CloudWatch describe and create/delete within the account)
Phase 1: Network Module (terraform/network/)
Deployed per environment via CI/CD. All resources are parameterised through variables and a shared local.common_tags block. NAT egress resources differ by environment.
| Resource | Terraform Resource Type | dev | prod |
|---|---|---|---|
| VPC | aws_vpc |
1 | 1 |
| Public Subnet | aws_subnet |
2 | 2 |
| Private Subnet | aws_subnet |
2 | 2 |
| Internet Gateway | aws_internet_gateway |
1 | 1 |
| Elastic IP | aws_eip |
1 | 2 |
| NAT Instance | aws_instance |
1 | — |
| NAT Instance Security Group | aws_security_group |
1 | — |
| NAT Gateway | aws_nat_gateway |
— | 2 |
| Public Route Table | aws_route_table |
1 | 1 |
| Private Route Table | aws_route_table |
1 (shared) | 2 (per-AZ) |
| Route Table Association | aws_route_table_association |
4 | 4 |
| VPC Flow Log | aws_flow_log |
1 | 1 |
| CloudWatch Log Group | aws_cloudwatch_log_group |
1 | 1 |
| IAM Role (Flow Logs) | aws_iam_role |
1 | 1 |
| CloudTrail Trail | aws_cloudtrail |
1 | 1 |
| CloudTrail S3 Bucket | aws_s3_bucket + sub-resources |
1 | 1 |
NAT Instance details (dev only):
- AMI: latest Amazon Linux 2 NAT AMI (
amzn-ami-vpc-nat-*) — looked up viadata "aws_ami"data source - Instance type:
t3.micro(sufficient for dev traffic volumes) - Placed in
public_subnet[0](us-east-1a) source_dest_check = false— required for NAT forwarding- Associated with one EIP
- Security group allows: inbound from VPC CIDR on all ports; outbound to
0.0.0.0/0 - Private route table: single shared RT with
0.0.0.0/0 → network_interface_idof the NAT Instance's primary ENI; both private subnets associate with this RT
NAT Gateway details (prod only):
- One per AZ, placed in the public subnet of that AZ
- Each associated with a dedicated EIP
- Each private subnet has its own route table pointing to the NAT Gateway in the same AZ
Phase 2: CI/CD Workflow (.github/workflows/terraform.yaml)
The workflow has two jobs:
-
plan— triggered on pull requests targetingmainfor changes underterraform/**orspecs/**:- Checks out code
- Configures AWS credentials via OIDC (
aws-actions/configure-aws-credentials@v4) - Runs
terraform init,terraform validate,terraform fmt -check - Runs
terraform plan -var-file=envs/dev.tfvars -out=tfplan - Posts plan output as a PR comment
-
apply— triggered on push tomain:- dev job: runs
terraform applywithdev.tfvarsautomatically - prod job: depends on
devjob success + manual approval via GitHub Environmentprod; runsterraform applywithprod.tfvars
- dev job: runs
Data Models
Terraform Variable Schema
Bootstrap Variables (terraform/bootstrap/variables.tf)
variable "aws_region" {
type = string
default = "us-east-1"
description = "AWS region for all bootstrap resources"
}
variable "state_bucket_name" {
type = string
description = "Globally unique name for the Terraform state S3 bucket"
}
variable "lock_table_name" {
type = string
default = "MM-terraform-lock"
description = "Name of the DynamoDB state lock table"
}
variable "github_org" {
type = string
description = "GitHub organisation name (used in OIDC trust policy)"
}
variable "github_repo" {
type = string
description = "GitHub repository name (used in OIDC trust policy)"
}
# Mandatory tagging variables
variable "environment" { type = string }
variable "owner" { type = string }
variable "cost_center" { type = string }
variable "created_by" { type = string }
variable "creation_date" { type = string }
Network Variables (terraform/network/variables.tf)
variable "aws_region" {
type = string
default = "us-east-1"
}
variable "environment" {
type = string
description = "Deployment environment: dev or prod"
validation {
condition = contains(["dev", "prod"], var.environment)
error_message = "environment must be 'dev' or 'prod'."
}
}
variable "vpc_cidr" {
type = string
description = "CIDR block for the VPC (10.1.0.0/16 for dev, 10.2.0.0/16 for prod)"
}
variable "availability_zones" {
type = list(string)
default = ["us-east-1a", "us-east-1b"]
description = "List of AZs to deploy into (exactly 2 required)"
}
variable "public_subnet_cidrs" {
type = list(string)
description = "List of /24 CIDR blocks for public subnets (one per AZ)"
}
variable "private_subnet_cidrs" {
type = list(string)
description = "List of /24 CIDR blocks for private subnets (one per AZ)"
}
variable "flow_log_retention_days" {
type = number
description = "CloudWatch log retention in days (14 for dev, 90 for prod)"
}
variable "cloudtrail_log_retention_days" {
type = number
description = "S3 lifecycle expiry for CloudTrail logs (90 for dev, 365 for prod)"
}
variable "nat_instance_type" {
type = string
default = "t3.micro"
description = "EC2 instance type for the NAT Instance (dev only)"
}
variable "state_bucket_name" {
type = string
description = "Name of the S3 bucket used for Terraform remote state (from Phase 0)"
}
variable "lock_table_name" {
type = string
description = "Name of the DynamoDB lock table (from Phase 0)"
}
# Mandatory tagging variables
variable "owner" { type = string }
variable "cost_center" { type = string }
variable "created_by" { type = string }
variable "creation_date" { type = string }
variable "data_classification" { type = string; default = "internal" }
variable "criticality" { type = string }
variable "backup_policy" { type = string; default = "none" }
variable "patch_group" { type = string; default = "none" }
Environment tfvars Files
terraform/network/envs/dev.tfvars
environment = "dev"
vpc_cidr = "10.1.0.0/16"
availability_zones = ["us-east-1a", "us-east-1b"]
public_subnet_cidrs = ["10.1.0.0/24", "10.1.1.0/24"]
private_subnet_cidrs = ["10.1.10.0/24", "10.1.11.0/24"]
flow_log_retention_days = 14
cloudtrail_log_retention_days = 90
nat_instance_type = "t3.micro"
owner = "platform-team"
cost_center = "mm-platform"
created_by = "terraform"
creation_date = "2025-01-01"
criticality = "low"
terraform/network/envs/prod.tfvars
environment = "prod"
vpc_cidr = "10.2.0.0/16"
availability_zones = ["us-east-1a", "us-east-1b"]
public_subnet_cidrs = ["10.2.0.0/24", "10.2.1.0/24"]
private_subnet_cidrs = ["10.2.10.0/24", "10.2.11.0/24"]
flow_log_retention_days = 90
cloudtrail_log_retention_days = 365
owner = "platform-team"
cost_center = "mm-platform"
created_by = "terraform"
creation_date = "2025-01-01"
criticality = "high"
Naming Convention Implementation
The naming convention MM-{env}-{region-short}-{az-short}-{resource}-{purpose} is implemented as a Terraform local:
locals {
region_short = "use1"
# AZ short tokens indexed by AZ name
az_short = {
"us-east-1a" = "use1a"
"us-east-1b" = "use1b"
}
# Helper: name for a non-AZ-specific resource
# Usage: local.name("vpc", "core")
# Returns: "MM-dev-use1-vpc-core"
name_prefix = "MM-${var.environment}-${local.region_short}"
# Helper function pattern (implemented inline per resource):
# AZ-specific: "MM-${var.environment}-${local.region_short}-${local.az_short[az]}-${resource}-${purpose}"
# Non-AZ-specific: "MM-${var.environment}-${local.region_short}-${resource}-${purpose}"
}
Mandatory Tags Implementation
locals {
common_tags = {
Environment = var.environment
Owner = var.owner
CostCenter = var.cost_center
Project = "mm-aws-infra"
ManagedBy = "Terraform"
CreatedBy = var.created_by
CreationDate = var.creation_date
DataClassification = var.data_classification
Criticality = var.criticality
BackupPolicy = var.backup_policy
PatchGroup = var.patch_group
}
}
Every resource block merges this map:
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = merge(local.common_tags, {
Name = "MM-${var.environment}-${local.region_short}-vpc-core"
})
}
Backend Configuration
terraform/network/backend.tf
terraform {
backend "s3" {
bucket = "<state_bucket_name>" # supplied via -backend-config or partial config
key = "network/${var.environment}/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "<lock_table_name>"
}
}
Because Terraform does not allow variable interpolation in backend blocks, the key is parameterised using a partial backend configuration passed at terraform init time:
terraform init \
-backend-config="key=network/dev/terraform.tfstate" \
-backend-config="bucket=${STATE_BUCKET}" \
-backend-config="dynamodb_table=${LOCK_TABLE}"
The CI/CD workflow sets these values from GitHub Actions environment variables or secrets.
Repository Structure
.
├── .github/
│ └── workflows/
│ └── terraform.yaml # CI/CD pipeline (Phase 2)
├── .kiro/
│ └── specs/
│ └── mm-aws-network-infra/
│ ├── requirements.md
│ ├── design.md
│ └── tasks.md
├── docs/
│ └── post-apply-checklist.md # Requirement 27.3 verification checklist
├── modules/ # Reserved for future shared Terraform modules
├── specs/
│ ├── phase0-state.yaml # Kiro spec: bootstrap (Req 19.1)
│ └── phase1-network.yaml # Kiro spec: network (Req 19.2)
├── terraform/
│ ├── bootstrap/ # Phase 0 — run once manually
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ ├── providers.tf
│ │ └── terraform.tfvars # NOT committed; supplied by engineer
│ └── network/ # Phase 1 — deployed by CI/CD
│ ├── backend.tf
│ ├── main.tf
│ ├── variables.tf
│ ├── outputs.tf
│ ├── providers.tf
│ ├── locals.tf # common_tags + naming helpers
│ ├── vpc.tf
│ ├── subnets.tf
│ ├── igw.tf
│ ├── eip.tf
│ ├── nat_gateway.tf
│ ├── nat_instance.tf # NAT Instance + SG (dev only, count = env == "dev" ? 1 : 0)
│ ├── route_tables.tf
│ ├── flow_logs.tf
│ ├── cloudtrail.tf
│ └── envs/
│ ├── dev.tfvars
│ └── prod.tfvars
File Responsibilities
| File | Contents |
|---|---|
bootstrap/main.tf |
S3 bucket, DynamoDB table, IAM OIDC provider, IAM role |
network/locals.tf |
common_tags, region_short, az_short map, name-building expressions |
network/vpc.tf |
aws_vpc resource |
network/subnets.tf |
aws_subnet resources (public x2, private x2) with EKS tags |
network/igw.tf |
aws_internet_gateway resource |
network/eip.tf |
aws_eip resources — 1 in dev (for NAT Instance), 2 in prod (for NAT Gateways) |
network/nat_gateway.tf |
aws_nat_gateway resources (x2, prod only) with depends_on IGW |
network/nat_instance.tf |
aws_instance NAT Instance + aws_security_group (dev only); count conditional on var.environment == "dev" |
network/route_tables.tf |
Public RT; 1 shared private RT (dev) or 2 per-AZ private RTs (prod); 4 associations; routes |
network/flow_logs.tf |
aws_flow_log, aws_cloudwatch_log_group, IAM role for flow logs |
network/cloudtrail.tf |
aws_cloudtrail, aws_s3_bucket + sub-resources for CloudTrail |
network/backend.tf |
Partial S3 backend configuration |
network/providers.tf |
AWS provider pinned version |
network/outputs.tf |
VPC ID, subnet IDs, NAT GW IDs or NAT Instance ID, etc. for downstream modules |
Key Design Decisions
1. NAT Instance in dev, NAT Gateways in prod
dev uses a single EC2 NAT Instance (t3.micro, Amazon Linux 2 NAT AMI) placed in public_subnet[0] (us-east-1a). Both private subnets share one route table pointing to the NAT Instance's primary ENI. This eliminates the ~$32/month per-NAT-Gateway cost in the non-production environment where high availability is not required.
prod uses two managed NAT Gateways — one per AZ — each with a dedicated EIP. Each private subnet has its own route table pointing to the NAT Gateway in the same AZ. This eliminates cross-AZ traffic charges and removes the single point of failure that a shared NAT Gateway would introduce.
The environment-specific resources are controlled via count = var.environment == "dev" ? 1 : 0 (NAT Instance) and count = var.environment == "prod" ? 2 : 0 (NAT Gateways), keeping a single shared Terraform configuration for both environments.
2. One NAT Gateway per AZ in prod (not shared)
See Decision 1 above for the full rationale. The per-AZ NAT Gateway topology in prod ensures that a NAT Gateway failure in one AZ does not affect egress in the other AZ, and avoids cross-AZ data transfer charges.
3. Partial Backend Configuration
Terraform does not support variable interpolation in backend {} blocks. Rather than hardcoding the bucket name and state key, the design uses a partial backend configuration: backend.tf contains only the region and encrypt flag; the bucket name, key, and DynamoDB table are passed via -backend-config flags at terraform init time. The CI/CD workflow injects these from GitHub Actions secrets/environment variables, keeping them out of source control.
4. map_public_ip_on_launch = false on Public Subnets
No compute workloads are placed in public subnets in this phase. Setting map_public_ip_on_launch = false prevents accidental public IP assignment if a resource is mistakenly launched there. NAT Gateways and the NAT Instance receive their public IPs via explicitly allocated EIPs, not this flag.
5. Separate CloudTrail Bucket per Environment
Each environment gets its own CloudTrail S3 bucket rather than a shared bucket. This keeps environment blast radius contained: a misconfigured bucket policy in dev cannot affect prod audit logs. The lifecycle retention difference (90 days dev vs 365 days prod) also makes per-environment buckets simpler to manage.
6. local.common_tags as the Single Source of Truth for Tags
All mandatory tags are defined once in locals.tf. Every resource block uses tags = merge(local.common_tags, { Name = "..." }). This means adding a new mandatory tag requires editing exactly one file. A terraform-compliance or checkov pre-apply lint rule can enforce that no resource block omits local.common_tags.
7. EKS Subnet Tags Applied at Creation
The kubernetes.io/role/elb and kubernetes.io/role/internal-elb tags are applied at subnet creation time as part of the tags merge. This avoids a future state drift when an EKS cluster is added — the subnets are already correctly tagged and EKS will discover them without any Terraform changes.
8. Bootstrap is Not Managed by Remote State
The bootstrap module (terraform/bootstrap/) uses local state (or a manually managed state file) because the remote state backend does not yet exist when bootstrap runs. After bootstrap completes, all subsequent modules use the S3 backend. The bootstrap state file should be stored securely by the platform engineer (e.g., in a secure local directory or imported into the S3 bucket manually after creation).
Error Handling
Terraform Init Failures
If the S3 state bucket is unreachable during terraform init, Terraform exits with a non-zero code and a descriptive error. The CI/CD workflow treats any non-zero exit as a failure and stops the pipeline. No plan or apply step runs.
State Lock Conflicts
If a concurrent Terraform run attempts to acquire the DynamoDB lock while one is held, Terraform returns a lock conflict error with the lock holder's ID and timestamp. The operator must either wait for the lock to be released or run terraform force-unlock <lock-id> after confirming the holding process is dead.
NAT Egress Provisioning Dependencies
prod NAT Gateways: depend on both the EIP allocation and the Internet Gateway being attached. Terraform's dependency graph handles this automatically via depends_on = [aws_internet_gateway.main] on the NAT Gateway resources. If the IGW attachment fails, the NAT Gateway creation is not attempted.
dev NAT Instance: depends on the public subnet and the EIP. The route table entry uses network_interface_id of the NAT Instance's primary ENI, so the route table association depends on the instance being in a running state. Terraform handles this via implicit resource references in the route resource.
Plan Deviation Gate (Requirement 27)
The CI/CD workflow captures the terraform plan exit code and resource change summary. Expected resource counts differ by environment:
dev: 1 VPC, 4 subnets, 1 IGW, 1 EIP, 1 NAT Instance, 1 NAT Instance SG, 2 RTs, 4 RT associations, 1 flow log, 1 CloudTrail trailprod: 1 VPC, 4 subnets, 1 IGW, 2 EIPs, 2 NAT GWs, 3 RTs, 4 RT associations, 1 flow log, 1 CloudTrail trail
If the plan shows a resource count outside the expected range for the target environment, the workflow posts a warning comment on the PR and requires a human reviewer to explicitly approve before apply proceeds.
Missing Required Directories (Requirement 18)
A pre-apply shell step in the CI/CD workflow checks for the existence of all required top-level directories (specs/, terraform/bootstrap/, terraform/network/, modules/, .github/workflows/, docs/). If any are absent, the step exits non-zero with a descriptive message identifying the missing directory, and the pipeline fails before terraform init is called.
Environment Cross-Contamination Guard (Requirement 23.5)
Before terraform apply, the workflow verifies that the plan does not include changes to resources tagged with a different Environment value than the target. This is implemented as a terraform show -json tfplan | jq check that scans planned resource changes for Environment tag mismatches.
Testing Strategy
This feature is Infrastructure as Code (Terraform + GitHub Actions). Property-based testing is not applicable because:
- The code is declarative configuration, not a function with inputs and outputs.
- Correctness is verified by Terraform's own plan/apply cycle and AWS API responses.
- Running 100 iterations of
terraform applywould be prohibitively expensive and slow.
The testing strategy uses the following complementary approaches:
1. Static Analysis (pre-apply, every PR)
| Tool | What it checks |
|---|---|
terraform validate |
HCL syntax and provider schema correctness |
terraform fmt -check |
Consistent formatting |
checkov or tfsec |
Security misconfigurations (public S3, unencrypted resources, missing tags) |
terraform-compliance |
Policy-as-code: every resource has local.common_tags, no IGW route on private RTs |
2. Plan Verification (pre-apply, every PR and merge)
terraform planis run fordevon every PR.- The plan output is parsed to confirm the expected resource count (Requirement 27).
- Any deviation triggers a required human review before apply.
3. Post-Apply Smoke Tests (after each terraform apply)
A shell script or AWS CLI check verifies:
- VPC exists with the correct CIDR block.
- All four subnets are in the correct AZs with correct CIDRs.
dev: NAT Instance is inrunningstate, source/destination check is disabled, EIP is associated.prod: Both NAT Gateways are inavailablestate, each with a dedicated EIP.- Public route table has a
0.0.0.0/0 → IGWroute. dev: One shared private route table has a0.0.0.0/0 → NAT Instance ENIroute; no IGW route.prod: Each private route table has a0.0.0.0/0 → NAT-GWroute (no IGW route).- VPC Flow Logs are delivering records to the CloudWatch log group.
- CloudTrail trail is active and logging.
- All S3 buckets have public access blocked and SSE enabled.
- DynamoDB lock table exists with
LockIDpartition key.
These checks are defined in docs/post-apply-checklist.md and run as a post-deploy step in the CI/CD workflow.
4. Idempotency Test
After a successful terraform apply, the workflow runs terraform plan again and asserts that the plan shows zero changes. A non-zero change count after a clean apply indicates a non-idempotent resource configuration and fails the pipeline.
5. Integration Test: State Locking
A manual test (documented in docs/post-apply-checklist.md) verifies that running two concurrent terraform plan operations against the same state key results in one succeeding and one receiving a lock conflict error.
6. Security Baseline Verification
checkov is run in the CI/CD pipeline with a policy file that enforces:
- No S3 bucket has public access enabled.
- All S3 buckets have SSE configured.
- All DynamoDB tables have SSE enabled.
- No IAM user with static access keys is created.
- No route table associated with a private subnet has a route to an IGW.
- The NAT Instance (dev) has
source_dest_check = false.