Multi-Environment AWS CDK Deployments in a Single Account

Building AWS applications that support multiple environments — development and production — is a fundamental requirement for modern software delivery. When you're working within a single AWS account, this may require special considerations.

This article shares the lessons learned from building a sophisticated real-time chat application that solves these challenges through specific AWS CDK patterns.

Overview

Like many projects, this started simply enough. I needed a chat application with real-time messaging, user authentication, and AI-powered responses. The tech stack was straightforward:

Frontend: React + TypeScript + Vite + Tailwind CSS
Backend: AWS AppSync GraphQL API with real-time subscriptions
Infrastructure: AWS CDK
AI Integration: AWS Bedrock Claude 3.5 Sonnet
Authentication: AWS Cognito with custom domains

Developing this required many deployments as we added and adjusted AppSync Resolvers. Many times, a simple change to one of these Resolvers would require a complete delete of the CloudFormation Stack. This causes lengthly delays because some of the Resources would take a long time to delete.

The First Challenge: Multi-Environment DNS in a Single Account

The first major hurdle appeared when we tried to set up development and production environments. We wanted clean, professional URLs:

Production: example.com
Development: dev.example.com

This is a standard approach but if not done correctly, can create complexitites in the CDK code.

// Don't do this - it's painful to maintain
const apiDomain =
  environment === "dev" ? `api.dev.example.com` : `api.example.com`;
const authDomain =
  environment === "dev" ? `auth.dev.example.com` : `auth.example.com`;
// ... and so on for every subdomain

The Solution: Nested Hosted Zones

Instad of using a single Route 53 Hosted Zone, I created a separate Route 53 hosted zone for the dev subdomain within the main domain's hosted zone. While this sounds more complex by requiring two Hosted Zones, this actually simplfies the CDK code required.

In the main hosted zone, you create NS records pointing to the dev hosted zone
The dev hosted zone becomes authoritative for all *.dev.example.com domains

// From infra/src/main.ts
const environments = {
  prod: {
    domainName: "example.com",
    environment: "prod" as const,
    hostedZoneId: "Z088376XXXXXXXX", // Main domain hosted zone
  },
  dev: {
    domainName: "dev.example.com",
    environment: "dev" as const,
    hostedZoneId: "Z031287XXXXXXXX", // Subdomain hosted zone
  },
};

This setup simplfies the CDK code because no environment prefixes are needed. Now, in your CDK code, you can simply write:

const apiDomain = `api.${props.domainName}`;
const authDomain = `auth.${props.domainName}`;

The domain name already includes the environment context (example.com or dev.example.com), so your subdomains automatically become:

Production: api.example.com, auth.example.com
Development: api.dev.example.com, auth.dev.example.com

This approach eliminates conditional logic throughout your infrastructure code and makes adding new environments trivial.

The Second Challenge: Lengthy Deployment/Delete times for Complex Stacks

With Cognito User Pools, custom domains, CloudFront distributions, and certificates, a full deployment or deletions can take a significant amount of time.

The solution? Split your infrastructure into logical, independently deployable stacks:

1. DNSStack       → Manages certificates and some DNS resources
2. CognitoStack   → Handles authentication infrastructure
3. MainStack      → Contains application resources

This separation is important because it moves resources that should never change to their own Stack and allows the Main Stack to be deleted without deleting foundational resources.

DNS Stack: Deploy once when setting up the environment, then forget about it
Cognito Stack: Update only when authentication requirements change
Main Stack: Deploy frequently with your application updates

Now, if we need to delete the Main Stack, we don't have to delete the Cognito Custom Domain because we created that in the Cognito Stack.

The Third Challenge: Resource Dependencies

Cognito custom domains require the apex domain to exist - According to AWS documentation, "the parent domain must have a DNS A record" for custom domain creation to succeed
CloudFront creates the A record for the apex domain
But CloudFront is in the Main stack, which deploys AFTER Cognito

The problem was clear: Cognito couldn't create auth.dev.example.com because dev.example.com didn't resolve to anything yet.

The DNS Placeholder Strategy

The solution is to create a temporary placeholder in the DNSStack:

// In DnsStack - Create placeholder
new ARecord(this, "BaseDomainPlaceholder", {
  zone: hostedZone,
  recordName: "", // Empty string = apex domain
  target: RecordTarget.fromIpAddresses("192.0.2.1"), // RFC5737 test IP
  comment: "Placeholder A record for Cognito custom domain validation",
  deleteExisting: true,
});

Why 192.0.2.1? It's a special IP address from RFC 5737 reserved for documentation and examples. It will never route anywhere real, making it perfect for our placeholder.

Later, when the Main stack deploys, it replaces this placeholder with the real CloudFront distribution:

// In SimpleChatStack - Replace with real CloudFront distribution
new ARecord(this, "WebsiteApexARecord", {
  zone: importedHostedZone,
  recordName: "",
  target: RecordTarget.fromAlias(new CloudFrontTarget(website.distribution)),
  deleteExisting: true, // Atomically replaces the placeholder
});

The deleteExisting: true flag is important - it uses a CloudFormation custom resource to delete and recreate the record atomically, ensuring no downtime.

Why This Matters Beyond DNS

This pattern unlocks a critical capability: you can now delete and recreate your Main stack without touching Cognito.

Without this separation, deleting a stack that contains a Cognito custom domain triggers a painful process:

CloudFormation deletes the custom domain (can take 15+ minutes!)
You wait for the deletion to complete
You redeploy and wait for the custom domain to be recreated

With our architecture, you can freely iterate and even delete your Main stack while Cognito and Resources created in the DNS Stack remain untouched. Development velocity increases dramatically.

The Fourth Challenge: Automating the Deployment Pipeline

With our multi-stack, multi-environment setup working, the next step was automation. The goal was simple: push to develop, automatically deploy to dev, create a PR to main, and deploy to production after approval.

But the implementation revealed several subtle challenges that required thoughtful solutions.

The Automated PR Workflow

After a successful deployment to dev, we wanted to automatically create (or update) a PR to main:

- name: Create or Update Pull Request to main
  if: github.ref == 'refs/heads/develop' && success()
  run: |
    # Check if an open PR already exists
    PR_URL=$(gh pr list --base main --head develop --state open --json url --jq '.[0].url' || true)
 
    if [ -n "$PR_URL" ]; then
      echo "Updating existing PR..."
      gh pr edit develop \
        --title "Merge develop into main (Automated PR)" \
        --body "Updated: $(date -u) - Automated PR updated after successful deployment"
    else
      echo "Creating new PR..."
      gh pr create \
        --base main \
        --head develop \
        --title "Merge develop into main (Automated PR)" \
        --body "Automated PR created after successful deployment"
    fi

This pattern ensures:

Only one PR exists at a time from develop to main
The PR is automatically updated with each dev deployment
Developers can see the deployment status before merging
Production deployments remain manual (through PR approval)

When vibe coding, I have found it useful to make smaller, more frequent commits to develop. This pattern of allowing those pushes to develop, deploying and testing there, and then merging to main with several of these commits allows for easier rollbacks, while still testing extensively in develop. Once we've made several changes, we can merge that with production.

The Fifth Challenge: Cross-Stack Communication

CDK offers built-in cross-stack references, but they come with a catch: once you create an export, you can't delete the exporting stack until all importing stacks are deleted first. This creates deployment ordering headaches and makes it difficult to iterate quickly.

SSM Parameter Store

Instead of tight coupling, we use SSM Parameter Store as a communication bus between stacks:

// DnsStack writes parameters
new StringParameter(this, "WebsiteCertArnParam", {
  parameterName: `/${environment}/app/certificate/website/arn`,
  stringValue: websiteCertificate.certificateArn,
  description: "Website certificate ARN",
});
 
// SimpleChatStack reads parameters
const websiteCertificateArn = StringParameter.valueForStringParameter(
  this,
  `/${environment}/app/certificate/website/arn`,
);

This pattern is powerful because:

No deployment dependencies - Stacks remain independent
Visible in AWS Console - Easy debugging when things go wrong
Environment namespacing - Clear separation with /${environment}/ prefix

This will cause you to lose compile-time type safety. But in practice, the flexibility far outweighs this minor inconvenience.

The Sixth Challenge: Making Deployments Developer-Friendly

With all these stacks and dependencies, deployment could have become a nightmare. Instead, we created smart deployment scripts that handle all the complexity:

#!/bin/bash
# deploy.sh - The magic happens here
 
environment="${1:-dev}"
 
echo "🚀 Deploying infrastructure..."
 
# Deploy stacks in the correct order
pnpm cdk deploy CognitoStack-${environment}
pnpm cdk deploy MainStack-${environment}
 
# Auto download the frontend config
SITE_BUCKET_NAME=$(aws cloudformation describe-stacks \
  --stack-name SimpleChatMainStack-${environment} \
  --region us-east-1 \
  --query "Stacks[0].Outputs[?OutputKey=='SiteBucketName'].OutputValue" --output text)
 
GRAPHQL_API_ID=$(aws cloudformation describe-stacks \
  --stack-name SimpleChatMainStack-${environment} \
  --region us-east-1 --profile "${profile}" \
  --query "Stacks[0].Outputs[?OutputKey=='GraphQLApiId'].OutputValue" --output text)
 
 
# Copy config.json from S3 to site/public
aws s3 cp "s3://${SITE_BUCKET_NAME}/config.json" "../frontend/public/config.json" --region us-east-1 --profile "${profile}"
 
# Conditionally copy to config.dev.json or config.prod.json
if [ "$environment" == "dev" ]; then
  cp "../frontend/public/config.json" "../frontend/public/config.dev.json"
elif [ "$environment" == "prod" ]; then
  cp "../frontend/public/config.json" "../frontend/public/config.prod.json"
fi
 
# Update the GraphQL config with the correct API ID
echo "📝 Updating GraphQL config with API ID: $GRAPHQL_API_ID..."
cd ../frontend
if [[ "$OSTYPE" == "darwin"* ]]; then
  # macOS
  sed -i '' "s/apiId: .*/apiId: $GRAPHQL_API_ID/" .graphqlconfig.yml
else
  # Linux
  sed -i "s/apiId: .*/apiId: $GRAPHQL_API_ID/" .graphqlconfig.yml
fi
 
echo "🧬 Generating GraphQL code for the site..."
npx @aws-amplify/cli codegen

config.json setup

Now, when we deploy from our local terminal, we can get the config.json that was deployed to our S3 Bucket and copy it down to our local frontend. This allows us to use the deployed resources and work on our frontend locally.

    "config:dev": "cp public/config.dev.json public/config.json",
    "config:prod": "cp public/config.prod.json public/config.json",
    "dev:local": "pnpm run config:dev && vite",
    "dev:local-prod": "pnpm run config:prod && vite"

Scripts within our package.json file in our Frontend allow us to switch between dev and prod deployments. If you want to use the dev config locally, you would run:

pnpm dev:local

This will copy the config.dev.json to config.json and run Vite. By pushing the cp to our local script, we eliminate the complications of determining the environment in our client. It is always looking for config.json whether it is being run locally or deployed in the S3 Bucket.

codegen setup

Additionally, in order for our local GraphQL API.ts to be accurate, we want to use codegen. To do this, we need to update our .graphqlconfig.yml using the GRAPHQL_API_ID. This will allow us to run npx @aws-amplify/cli codegen and generate the schema.json and API.ts files we need for our frontend.

Conclusion

Individually, each of these steps is a minor improvement in our development and deployment processes. However, when combined together, we have created processes that are much easier to work with and much faster to work with. Additionally, by making the code easier to read and understand, we can accellerate the process when using an AI Coding Assistant. If it makes sense to us and is easy for us to read, it will be easy for our models to read and understand it.