Requirements Document
Introduction
This document defines the requirements for the Mica Mirai (MM) AWS Multi-Account Landing Zone. This is Part 2 of the MM infrastructure series — Part 1 established a single-account VPC and network foundation (account 464796503181) with dev and prod VPCs. Part 2 migrates to a proper multi-account architecture using AWS Organizations and Control Tower, with three accounts: Management, Dev, and Prod.
The scope covers AWS Organizations setup, Control Tower enrollment, Service Control Policies (SCPs), centralized logging and security services, Tailscale-based hybrid networking with on-prem AI resources, backup strategies, cross-region disaster recovery compatibility, VPC private endpoints, security governance for AI model deployments, cost optimization controls, and IAM Identity Center (SSO) for cross-account access. All infrastructure is defined as Terraform code, version-controlled in GitHub, and aligned with the AWS Well-Architected Framework across all six pillars. This is a small startup — every design decision prioritizes cost optimization while maintaining a strong security baseline.
Glossary
- Landing_Zone: The multi-account AWS environment provisioned by Control Tower, including the Management, Dev, and Prod accounts with baseline guardrails.
- Management_Account: The AWS Organizations management account that hosts Control Tower, consolidated billing, CloudTrail organization trail, and IAM Identity Center. No workloads run here.
- Dev_Account: The AWS member account for development workloads, including the existing dev VPC (10.1.0.0/16) and future EKS dev clusters.
- Prod_Account: The AWS member account for production workloads, including the existing prod VPC (10.2.0.0/16) and future EKS prod clusters.
- AWS_Organizations: The AWS service that creates and manages the multi-account hierarchy with consolidated billing and policy-based governance.
- Control_Tower: The AWS service that automates landing zone setup with account factory, guardrails, and centralized logging.
- SCP: Service Control Policy — an Organizations policy that sets permission guardrails across member accounts.
- Guardrail: A Control Tower governance rule, either preventive (SCP-based) or detective (AWS Config rule-based).
- IAM_Identity_Center: AWS IAM Identity Center (formerly AWS SSO) — the centralized service for managing human user access across all accounts via permission sets.
- Permission_Set: An IAM Identity Center configuration that defines the IAM policies granted when a user assumes access to a specific account.
- Organization_Trail: A CloudTrail trail created in the Management_Account that logs API activity across all member accounts.
- Config_Aggregator: An AWS Config aggregator in the Management_Account that collects compliance data from all member accounts.
- GuardDuty: AWS GuardDuty — the threat detection service enabled across all accounts with findings delegated to the Management_Account.
- Security_Hub: AWS Security Hub — the centralized security findings aggregator that collects results from GuardDuty, Config, and other services.
- Tailscale: A WireGuard-based mesh VPN used to connect AWS VPCs with on-prem AI resources (NVIDIA GPU servers) without traditional VPN hardware.
- Tailscale_Subnet_Router: An EC2 instance running Tailscale that advertises VPC CIDR routes to the Tailscale network, enabling on-prem resources to reach AWS private subnets.
- On_Prem_AI: On-premises NVIDIA GPU servers used for AI model training and development, connected to AWS via Tailscale.
- VPC_Endpoint: An AWS PrivateLink interface or gateway endpoint that allows VPC resources to access AWS services without traversing the public internet.
- AWS_Backup: The managed backup service used to define backup plans, vaults, and retention policies across accounts.
- Backup_Vault: An AWS Backup vault that stores recovery points with encryption and access policies.
- DR_Region: The disaster recovery AWS region (us-west-2) used for cross-region backup replication and failover readiness.
- Cost_Anomaly_Detection: AWS Cost Anomaly Detection — the service that uses ML to identify unusual spending patterns and sends alerts.
- Budget_Alert: An AWS Budgets alert that notifies when actual or forecasted spend exceeds a defined threshold.
- Tagging_Policy: The set of mandatory tag keys and values applied to every provisioned AWS resource, consistent with Part 1:
MM-{env}-{region-short}-{resource}-{purpose}. - Naming_Convention: The format
MM-{env}-{region-short}-{az-short}-{resource}-{purpose}applied to all resource names, consistent with Part 1. - Well_Architected_Framework: The AWS Well-Architected Framework — the set of best practices across six pillars (Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, Sustainability) used to evaluate and improve the architecture.
- Common_Tags: The Terraform
localsblock (local.common_tags) that defines all mandatory tag key-value pairs, consistent with Part 1.
Requirements
Requirement 1: AWS Organizations Setup
User Story: As a platform engineer, I want an AWS Organization with a management account and two member accounts (Dev, Prod), so that workloads are isolated by environment with centralized billing and governance.
Acceptance Criteria
- THE Landing_Zone SHALL create an AWS Organization with the
ALLfeature set enabled in the Management_Account. - THE Landing_Zone SHALL create two Organizational Units (OUs):
Workloads(containing Dev_Account and Prod_Account) andSecurity(reserved for future dedicated security account). - THE Landing_Zone SHALL provision the Dev_Account as a member account within the
WorkloadsOU. - THE Landing_Zone SHALL provision the Prod_Account as a member account within the
WorkloadsOU. - THE Management_Account SHALL host only governance, billing, and identity services — no application workloads SHALL run in the Management_Account.
- WHEN a new member account is created, THE Landing_Zone SHALL apply the baseline SCPs and guardrails automatically.
- THE Landing_Zone SHALL enable consolidated billing in the Management_Account so that all member account charges appear on a single invoice.
Requirement 2: Control Tower Enrollment
User Story: As a platform engineer, I want AWS Control Tower managing the landing zone, so that account provisioning, guardrails, and audit logging follow AWS best practices with minimal manual configuration.
Acceptance Criteria
- THE Landing_Zone SHALL enable Control Tower in the Management_Account in the us-east-1 region.
- THE Landing_Zone SHALL configure Control Tower to use the existing Organization structure rather than creating a new one.
- THE Landing_Zone SHALL enroll the Dev_Account and Prod_Account as governed accounts under Control Tower.
- THE Landing_Zone SHALL enable the Control Tower audit and log archive capabilities using the Management_Account (startup cost optimization — no separate Log Archive or Audit accounts).
- WHEN Control Tower detects a guardrail drift, THE Landing_Zone SHALL generate a notification to the platform engineering team.
- THE Landing_Zone SHALL enable all mandatory (strongly recommended) Control Tower guardrails at the OU level.
Requirement 3: Service Control Policies
User Story: As a platform engineer, I want SCPs that prevent dangerous actions across member accounts, so that the blast radius of misconfigurations or compromised credentials is limited without blocking legitimate development work.
Acceptance Criteria
- THE Landing_Zone SHALL attach an SCP to the
WorkloadsOU that denies the use of AWS regions outside us-east-1 and us-west-2 (primary and DR regions). - THE Landing_Zone SHALL attach an SCP to the
WorkloadsOU that denies deletion or modification of CloudTrail trails in member accounts. - THE Landing_Zone SHALL attach an SCP to the
WorkloadsOU that denies disabling of GuardDuty in member accounts. - THE Landing_Zone SHALL attach an SCP to the
WorkloadsOU that denies creation of IAM users with console passwords or static access keys in member accounts. - THE Landing_Zone SHALL attach an SCP to the
WorkloadsOU that denies leaving S3 buckets without server-side encryption enabled. - THE Landing_Zone SHALL attach an SCP to the
WorkloadsOU that denies launching EC2 instances larger thanp3.2xlargeorg5.xlargeto prevent accidental GPU cost overruns. - THE Landing_Zone SHALL attach an SCP to the
WorkloadsOU that denies actions on the Terraform state S3 bucket and DynamoDB lock table except from the CI/CD IAM role. - IF a new SCP is required, THEN THE Landing_Zone SHALL define the SCP as Terraform code and apply it through the CI/CD pipeline.
Requirement 4: Centralized Logging — CloudTrail
User Story: As a platform engineer, I want a single organization-wide CloudTrail trail, so that all API activity across all accounts is captured in one place for security analysis and compliance.
Acceptance Criteria
- THE Landing_Zone SHALL create an Organization_Trail in the Management_Account with
is_organization_trailset totrue. - THE Organization_Trail SHALL log management events (read and write) for all member accounts.
- THE Organization_Trail SHALL deliver log files to a centralized S3 bucket in the Management_Account with SSE enabled and public access blocked.
- THE Organization_Trail SHALL have
log_file_validation_enabledset totrue. - THE Organization_Trail SHALL have
is_multi_region_trailset totrue. - THE Organization_Trail SHALL have
include_global_service_eventsset totrue. - WHERE Environment is
dev, THE centralized CloudTrail S3 bucket SHALL have a lifecycle rule expiring objects after 90 days. - WHERE Environment is
prod, THE centralized CloudTrail S3 bucket SHALL have a lifecycle rule expiring objects after 365 days. - WHEN the Organization_Trail replaces per-account trails from Part 1, THE Landing_Zone SHALL disable the per-account CloudTrail trails to avoid duplicate logging costs.
Requirement 5: Centralized Logging — AWS Config
User Story: As a platform engineer, I want AWS Config enabled across all accounts with centralized aggregation, so that resource configuration changes are tracked and compliance rules can be evaluated organization-wide.
Acceptance Criteria
- THE Landing_Zone SHALL enable AWS Config recording in the Dev_Account and Prod_Account for all supported resource types.
- THE Landing_Zone SHALL create a Config_Aggregator in the Management_Account that collects configuration and compliance data from all member accounts.
- THE Landing_Zone SHALL configure AWS Config to deliver configuration snapshots to a centralized S3 bucket in the Management_Account.
- THE Landing_Zone SHALL enable the following AWS Config managed rules in each member account:
s3-bucket-server-side-encryption-enabled,encrypted-volumes,iam-user-no-policies-check,restricted-ssh, andvpc-flow-logs-enabled. - IF a Config rule evaluates a resource as non-compliant, THEN THE Landing_Zone SHALL send a notification to the platform engineering team via SNS.
Requirement 6: Centralized Security — GuardDuty
User Story: As a platform engineer, I want GuardDuty enabled across all accounts with centralized findings, so that threats are detected automatically and findings are visible in one place.
Acceptance Criteria
- THE Landing_Zone SHALL enable GuardDuty in the Management_Account and designate it as the GuardDuty administrator account.
- THE Landing_Zone SHALL enable GuardDuty in the Dev_Account and Prod_Account as member accounts, with findings delegated to the Management_Account.
- THE Landing_Zone SHALL enable GuardDuty S3 protection and EKS protection features in all accounts.
- WHEN GuardDuty generates a HIGH or CRITICAL severity finding, THE Landing_Zone SHALL send a notification to the platform engineering team via SNS.
- THE Landing_Zone SHALL enable GuardDuty in both us-east-1 and us-west-2 (DR region) for all accounts.
Requirement 7: Security Hub
User Story: As a platform engineer, I want Security Hub aggregating findings from GuardDuty, Config, and other security services, so that the security posture across all accounts is visible in a single dashboard.
Acceptance Criteria
- THE Landing_Zone SHALL enable Security Hub in the Management_Account and designate it as the administrator account.
- THE Landing_Zone SHALL enable Security Hub in the Dev_Account and Prod_Account as member accounts.
- THE Landing_Zone SHALL enable the AWS Foundational Security Best Practices standard in Security Hub.
- THE Landing_Zone SHALL configure Security Hub to receive findings from GuardDuty and AWS Config.
- WHEN Security Hub identifies a CRITICAL finding, THE Landing_Zone SHALL send a notification to the platform engineering team via SNS.
Requirement 8: IAM Identity Center (SSO)
User Story: As a platform engineer, I want centralized identity management via IAM Identity Center, so that team members access all accounts through a single sign-on portal without static IAM credentials.
Acceptance Criteria
- THE Landing_Zone SHALL enable IAM_Identity_Center in the Management_Account in the us-east-1 region.
- THE Landing_Zone SHALL create the following Permission_Sets:
AdministratorAccess(full access for platform engineers),DeveloperAccess(scoped access for developers to Dev_Account only), andReadOnlyAccess(read-only access for auditing across all accounts). - THE Landing_Zone SHALL create IAM Identity Center groups:
PlatformEngineers,Developers, andAuditors. - THE Landing_Zone SHALL assign the
PlatformEngineersgroupAdministratorAccessto all three accounts. - THE Landing_Zone SHALL assign the
DevelopersgroupDeveloperAccessto the Dev_Account only. - THE Landing_Zone SHALL assign the
AuditorsgroupReadOnlyAccessto all three accounts. - THE
DeveloperAccessPermission_Set SHALL deny access to IAM, Organizations, and billing services. - WHEN a user authenticates via IAM_Identity_Center, THE session SHALL have a maximum duration of 8 hours.
Requirement 9: Tailscale Hybrid Networking
User Story: As a platform engineer, I want Tailscale-based connectivity between AWS VPCs and on-prem AI resources, so that NVIDIA GPU servers can communicate with AWS workloads securely without traditional VPN hardware or high monthly costs.
Acceptance Criteria
- THE Landing_Zone SHALL deploy one Tailscale_Subnet_Router EC2 instance in the Dev_Account private subnet and one in the Prod_Account private subnet.
- EACH Tailscale_Subnet_Router SHALL advertise the VPC CIDR of its account (10.1.0.0/16 for dev, 10.2.0.0/16 for prod) to the Tailscale network.
- THE Tailscale_Subnet_Router SHALL use a
t3.microinstance type to minimize cost. - THE Tailscale_Subnet_Router SHALL run in a private subnet with outbound internet access via the existing NAT egress path (NAT Instance in dev, NAT Gateway in prod).
- THE Tailscale_Subnet_Router SHALL authenticate to the Tailscale coordination server using an auth key stored in AWS Secrets Manager.
- WHEN the Tailscale_Subnet_Router is running, THE On_Prem_AI resources SHALL be able to reach private IP addresses within the advertised VPC CIDR.
- THE Tailscale_Subnet_Router security group SHALL allow inbound traffic only from the VPC CIDR and Tailscale UDP port 41641.
- THE Tailscale_Subnet_Router security group SHALL allow outbound traffic to the VPC CIDR and to the Tailscale coordination server (HTTPS and UDP 41641).
- IF the Tailscale_Subnet_Router instance becomes unhealthy, THEN THE Landing_Zone SHALL automatically replace the instance using an Auto Scaling Group with min=1, max=1, desired=1.
- THE Landing_Zone SHALL configure Tailscale ACLs to restrict On_Prem_AI access to only the private subnet CIDRs and specific ports required for AI model inference (HTTPS 443, gRPC 50051).
- THE Tailscale_Subnet_Router SHALL be named according to the Naming_Convention with resource token
ec2and purpose tokentailscale.
Requirement 10: VPC Private Endpoints
User Story: As a platform engineer, I want VPC endpoints for frequently used AWS services, so that traffic stays on the AWS backbone network, reducing NAT Gateway data processing costs and improving security posture.
Acceptance Criteria
- THE Landing_Zone SHALL create Gateway VPC endpoints for S3 and DynamoDB in both the Dev_Account and Prod_Account VPCs.
- THE Landing_Zone SHALL create Interface VPC endpoints for the following services in both accounts:
ecr.api,ecr.dkr,sts,logs(CloudWatch Logs), andsecretsmanager. - THE Gateway VPC endpoints SHALL be associated with all private route tables in each VPC.
- THE Interface VPC endpoints SHALL be placed in the private subnets with
private_dns_enabledset totrue. - EACH Interface VPC endpoint SHALL have a security group that allows inbound HTTPS (port 443) from the VPC CIDR only.
- THE Landing_Zone SHALL create Interface VPC endpoints for EKS-related services (
eks,eks-auth) in both accounts to support future private EKS API server access. - EACH VPC endpoint SHALL be named according to the Naming_Convention with resource token
vpceand purpose token matching the service name. - EACH VPC endpoint SHALL have all mandatory tags from the Tagging_Policy applied at creation time.
Requirement 11: AWS Backup Strategy
User Story: As a platform engineer, I want centralized backup policies, so that critical resources are automatically backed up with defined retention and the team does not rely on manual snapshot management.
Acceptance Criteria
- THE Landing_Zone SHALL create an AWS Backup vault in the Dev_Account and Prod_Account, each encrypted with an AWS-managed KMS key.
- THE Landing_Zone SHALL create a backup plan in each account with the following schedule: daily backups retained for 7 days (dev) or 30 days (prod), and weekly backups retained for 30 days (dev) or 90 days (prod).
- THE Landing_Zone SHALL assign backup plan resources using tag-based selection: resources tagged with
BackupPolicy = "daily"SHALL be included in the backup plan. - WHERE Environment is
prod, THE Landing_Zone SHALL configure cross-region backup copy rules to replicate recovery points to a Backup_Vault in the DR_Region (us-west-2). - WHERE Environment is
dev, THE Landing_Zone SHALL NOT configure cross-region backup replication to minimize cost. - THE Backup_Vault SHALL have a vault access policy that denies deletion of recovery points by any principal except the backup service role.
- IF a backup job fails, THEN THE Landing_Zone SHALL send a notification to the platform engineering team via SNS.
- EACH Backup_Vault SHALL be named according to the Naming_Convention with resource token
backupand purpose tokenvault.
Requirement 12: Cross-Region DR Compatibility
User Story: As a platform engineer, I want the landing zone designed for cross-region disaster recovery readiness, so that production workloads can fail over to us-west-2 without re-architecting the infrastructure.
Acceptance Criteria
- THE Landing_Zone SHALL define Terraform modules that accept
aws_regionas a variable, enabling deployment to us-west-2 without code changes. - THE Landing_Zone SHALL configure the Terraform S3 state bucket with cross-region replication to a bucket in us-west-2 for state durability.
- WHERE Environment is
prod, THE Landing_Zone SHALL pre-provision a VPC in us-west-2 with CIDR10.3.0.0/16(non-overlapping with us-east-1 CIDRs) as a DR-ready network. - THE DR VPC in us-west-2 SHALL have the same subnet topology as the prod VPC in us-east-1 (2 public subnets, 2 private subnets, 2 AZs).
- THE Landing_Zone SHALL document the DR runbook covering failover steps, DNS cutover, and data restoration procedures.
- THE SCPs SHALL allow us-west-2 as a permitted region for DR resources.
- WHERE Environment is
dev, THE Landing_Zone SHALL NOT provision DR resources in us-west-2 to minimize cost.
Requirement 13: Security for AI Model Deployments
User Story: As a platform engineer, I want security guardrails specific to AI model inference workloads, so that model artifacts, inference endpoints, and GPU resources are protected from unauthorized access and data exfiltration.
Acceptance Criteria
- THE Landing_Zone SHALL create an S3 bucket in each workload account for storing AI model artifacts, with SSE-KMS encryption using a customer-managed key (CMK).
- THE model artifact S3 buckets SHALL have bucket policies that restrict access to the EKS node IAM role and the CI/CD pipeline role only.
- THE Landing_Zone SHALL create a KMS key in each workload account for encrypting model artifacts, with a key policy that grants usage to the EKS node role and denies all other principals.
- THE Landing_Zone SHALL define an IAM policy for EKS GPU node groups that grants read-only access to the model artifact S3 bucket and denies access to all other S3 buckets.
- THE Landing_Zone SHALL create a security group for AI inference endpoints that allows inbound traffic only from the application load balancer security group on ports 443 (HTTPS) and 50051 (gRPC).
- WHEN an AI model artifact is uploaded to S3, THE Landing_Zone SHALL require the upload to use SSE-KMS encryption via a bucket policy condition.
- THE Landing_Zone SHALL enable VPC Flow Logs on all subnets where GPU instances run, with log retention of 90 days (dev) or 365 days (prod).
- IF an EKS pod attempts to access an S3 bucket other than the designated model artifact bucket, THEN THE IAM policy SHALL deny the request.
- THE Landing_Zone SHALL tag all AI-related resources with
WorkloadType = "ai-inference"for cost tracking and governance.
Requirement 14: Cost Optimization — Billing Alerts and Budgets
User Story: As a platform engineer at a small startup, I want proactive cost controls and alerts, so that unexpected AWS charges are caught early before they impact the company's runway.
Acceptance Criteria
- THE Landing_Zone SHALL create AWS Budgets in the Management_Account with the following monthly thresholds: $100 (dev account), $500 (prod account), and $200 (management account).
- THE Landing_Zone SHALL configure budget alerts at 50%, 80%, and 100% of each budget threshold, sending notifications via SNS to the platform engineering team.
- THE Landing_Zone SHALL enable AWS Cost Anomaly Detection in the Management_Account, monitoring by AWS service and by linked account.
- THE Cost_Anomaly_Detection SHALL send alerts when anomalous spend exceeds $10 above the expected baseline.
- THE Landing_Zone SHALL create a forecasted budget alert that triggers when the forecasted monthly spend exceeds 90% of the budget threshold.
- THE Landing_Zone SHALL enable Cost Explorer in the Management_Account for cross-account cost visibility.
- THE Landing_Zone SHALL tag all resources with
CostCenterandEnvironmenttags to enable cost allocation reporting. - WHEN a budget alert fires, THE Landing_Zone SHALL deliver the notification to both email and an SNS topic for integration with alerting tools.
Requirement 15: Cost Optimization — Resource Right-Sizing
User Story: As a platform engineer, I want cost-conscious defaults baked into the landing zone, so that the startup does not overspend on infrastructure that exceeds its current needs.
Acceptance Criteria
- WHERE Environment is
dev, THE Landing_Zone SHALL use a single NAT Instance (t3.micro) instead of managed NAT Gateways, consistent with Part 1. - WHERE Environment is
dev, THE Landing_Zone SHALL use shorter log retention periods (14 days for Flow Logs, 90 days for CloudTrail) to reduce storage costs. - THE Landing_Zone SHALL use
PAY_PER_REQUESTbilling mode for all DynamoDB tables to avoid provisioned capacity charges at low traffic volumes. - THE Landing_Zone SHALL use Gateway VPC endpoints (free) for S3 and DynamoDB instead of Interface endpoints where possible.
- THE Landing_Zone SHALL schedule non-production Tailscale_Subnet_Router instances to stop outside business hours (8 PM - 8 AM ET weekdays, all day weekends) using EC2 Instance Scheduler or a Lambda function.
- WHERE Environment is
dev, THE Landing_Zone SHALL set CloudWatch Logs retention to the minimum required period to reduce storage costs. - THE Landing_Zone SHALL use AWS-managed KMS keys (free) instead of customer-managed keys where the security requirements permit, reserving CMKs only for AI model artifact encryption.
Requirement 16: Tagging and Naming Conventions
User Story: As a platform engineer, I want consistent tagging and naming across all three accounts, so that resources are identifiable, cost-allocatable, and governable without consulting external documentation.
Acceptance Criteria
- THE Landing_Zone SHALL apply the
Nametag to every provisioned resource using the formatMM-{env}-{region-short}-{az-short}-{resource}-{purpose}, consistent with Part 1. - WHERE a resource is not AZ-specific, THE Landing_Zone SHALL omit the
{az-short}token from the name. - THE Landing_Zone SHALL apply all mandatory tags from the Tagging_Policy to every resource:
Name,Environment,Owner,CostCenter,Project,ManagedBy,CreatedBy,CreationDate,DataClassification,Criticality,BackupPolicy, andPatchGroup. - THE
Projecttag value SHALL bemm-aws-infrafor landing zone resources andmm-ai-platformfor AI-specific resources. - THE Landing_Zone SHALL add an
AccountTypetag with valuesmanagement,dev, orprodto distinguish resources by account. - THE Landing_Zone SHALL enforce tagging via Terraform
local.common_tagsin each module, consistent with Part 1. - THE Landing_Zone SHALL create an AWS Organizations tag policy that requires the
Environment,CostCenter, andManagedBytags on all taggable resources. - IF a resource is created without the required tags, THEN THE Organizations tag policy SHALL report the resource as non-compliant.
Requirement 17: Well-Architected Framework Alignment
User Story: As a platform engineer, I want the landing zone aligned with all six pillars of the AWS Well-Architected Framework, so that the architecture follows proven best practices from the start.
Acceptance Criteria
- THE Landing_Zone SHALL address the Operational Excellence pillar by defining all infrastructure as Terraform code with CI/CD automation, centralized logging, and documented runbooks.
- THE Landing_Zone SHALL address the Security pillar by enabling GuardDuty, Security Hub, organization-wide CloudTrail, SCPs, IAM Identity Center (no static credentials), and encryption at rest for all data stores.
- THE Landing_Zone SHALL address the Reliability pillar by using multi-AZ subnet topology, cross-region backup replication (prod), and Auto Scaling Groups for Tailscale_Subnet_Routers.
- THE Landing_Zone SHALL address the Performance Efficiency pillar by using VPC endpoints to reduce latency for AWS API calls and right-sizing instances for current workload levels.
- THE Landing_Zone SHALL address the Cost Optimization pillar by implementing billing alerts, budgets, cost anomaly detection, NAT Instance in dev, scheduled instance stops, and Gateway VPC endpoints.
- THE Landing_Zone SHALL address the Sustainability pillar by right-sizing resources, scheduling non-production resources to stop when unused, and selecting the most efficient instance types for each workload.
Requirement 18: Cross-Account Terraform State Management
User Story: As a platform engineer, I want Terraform state organized by account and module, so that each account's infrastructure can be planned and applied independently without state conflicts.
Acceptance Criteria
- THE Landing_Zone SHALL store Terraform state in the existing S3 state bucket with keys organized as
{module}/{account}/{env}/terraform.tfstate(e.g.,landing-zone/management/mgmt/terraform.tfstate). - THE Landing_Zone SHALL use the existing DynamoDB lock table for state locking across all modules.
- THE Landing_Zone SHALL use Terraform AWS provider aliases to manage resources in multiple accounts from a single configuration where cross-account orchestration is required.
- THE Landing_Zone SHALL define separate Terraform modules for:
organization(Management_Account),account-baseline(applied per member account),security-services(GuardDuty, Security Hub, Config),networking(VPC endpoints, Tailscale),backup(AWS Backup), andcost-management(budgets, anomaly detection). - EACH Terraform module SHALL use partial backend configuration, consistent with Part 1, so that bucket name and lock table are injected at
terraform inittime. - IF a Terraform apply in one module fails, THEN THE failure SHALL NOT affect the state of other modules.
Requirement 19: CI/CD Pipeline for Multi-Account Deployment
User Story: As a platform engineer, I want the CI/CD pipeline extended to deploy landing zone resources across all three accounts, so that multi-account infrastructure changes follow the same plan-review-apply workflow as Part 1.
Acceptance Criteria
- THE Landing_Zone SHALL extend the existing GitHub Actions workflow to support multi-account Terraform deployments.
- THE CI/CD pipeline SHALL use OIDC-based authentication to assume roles in each target account (Management, Dev, Prod).
- THE CI/CD pipeline SHALL create separate IAM roles in each member account for Terraform operations, with trust policies allowing assumption from the GitHub Actions OIDC provider.
- WHEN a pull request is opened, THE CI/CD pipeline SHALL run
terraform planfor all affected modules and post the plan output as a PR comment. - WHEN code is merged to
main, THE CI/CD pipeline SHALL apply changes to the Management_Account first, then Dev_Account, then Prod_Account in sequence. - THE Prod_Account apply step SHALL require manual approval via a GitHub Environment protection rule, consistent with Part 1.
- THE CI/CD pipeline SHALL run
terraform validateandterraform fmt -checkfor all modules before any plan or apply step.
Requirement 20: SNS Notification Infrastructure
User Story: As a platform engineer, I want a centralized notification topic for all security, cost, and operational alerts, so that the team receives timely notifications through a single channel.
Acceptance Criteria
- THE Landing_Zone SHALL create an SNS topic in the Management_Account named
MM-mgmt-use1-sns-platform-alertsfor all centralized alerts. - THE Landing_Zone SHALL create an SNS topic in each workload account named
MM-{env}-use1-sns-platform-alertsfor account-specific alerts. - THE SNS topics SHALL have email subscriptions for the platform engineering team.
- THE SNS topics SHALL have SSE enabled using AWS-managed keys.
- THE SNS topics SHALL have access policies that allow only the relevant AWS services (GuardDuty, Budgets, Config, Backup, Security Hub) to publish messages.
- EACH SNS topic SHALL have all mandatory tags from the Tagging_Policy applied at creation time.
Requirement 21: Account Migration Strategy
User Story: As a platform engineer, I want a clear migration path from the current single-account setup to the multi-account architecture, so that existing Part 1 resources are preserved and transitioned without downtime.
Acceptance Criteria
- THE Landing_Zone SHALL document the migration plan for moving the existing dev VPC (10.1.0.0/16) resources to the Dev_Account.
- THE Landing_Zone SHALL document the migration plan for moving the existing prod VPC (10.2.0.0/16) resources to the Prod_Account.
- THE migration plan SHALL use Terraform state move operations (
terraform state mv) or import blocks to transfer resource ownership without destroying and recreating resources. - THE Landing_Zone SHALL maintain the existing CIDR allocations (10.1.0.0/16 for dev, 10.2.0.0/16 for prod) in the new accounts to avoid re-addressing.
- WHEN migrating resources, THE Landing_Zone SHALL ensure zero downtime for any running workloads.
- THE Landing_Zone SHALL update the Terraform backend configuration to use account-specific state keys after migration.
- IF migration of a resource fails, THEN THE Landing_Zone SHALL roll back to the previous state without data loss.