In this demo, we build a WebSocket server using Amazon Elastic Container Service with Fargate, fronted by an Application Load Balancer and Amazon CloudFront Distribution for TLS termination. The server auto-scales based on connection count, and traffic between CloudFront and the ALB is secured with a custom header — no domain or certificate required.

flowchart LR
    Client((Client))
    CF[CloudFront<br/>Distribution]
    subgraph VPC
        ALB[Application<br/>Load Balancer]
        subgraph ASG[Auto Scaling Group]
            Node[Node Server]
        end
    end
    Client -->|WSS| CF
    CF -->|WS| ALB
    ALB --> Node
    
    style CF fill:#8C4FFF,color:#fff
    style ALB fill:#8C4FFF,color:#fff 
    style Node fill:#FF9900,color:#fff
    style VPC fill:#E9F3E6,stroke:#248814,stroke-width:2px
    style ASG fill:#FFE5CC,stroke:#FF9900,stroke-width:2px,stroke-dasharray: 5 5

ECS Fargate

Docker Build

FROM --platform=linux/arm64 node:20-alpine AS build
WORKDIR /usr/src/app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

FROM --platform=linux/arm64 node:20-alpine
ENV NODE_ENV=production
RUN apk add --no-cache curl
USER node
WORKDIR /usr/src/app
COPY --chown=node:node package*.json ./
COPY --from=build --chown=node:node /usr/src/app/node_modules ./node_modules
COPY --from=build --chown=node:node /usr/src/app/dist ./dist
EXPOSE 8080
CMD [ "node", "dist/server.js" ]

This is a multi-stage build on node:20-alpine. The first stage installs dependencies and compiles TypeScript. The second stage copies only what's needed to run — no dev dependencies, no source files. The container runs as the node user rather than root, and curl is included for the health check.

Cluster

this.cluster = new Cluster(this, 'Cluster', {
  vpc: props.vpc,
  clusterName: 'websocket-service',
  containerInsightsV2: ContainerInsights.ENHANCED,
});

Because we are using Fargate (serverless containers) instead of EC2 for our compute infrastructure, we don't need an EC2 Auto Scaling Group. The cluster automatically provisions underlying capacity based on the Task Definitions we define for it. Container Insights is enabled in enhanced mode for better observability out of the box.

Fargate Task Definition

const webSocketTask = new FargateTaskDefinition(
  this,
  'WebSocketTaskDefinition',
  {
    memoryLimitMiB: 2048,
    cpu: 1024,
    runtimePlatform: {
      operatingSystemFamily: OperatingSystemFamily.LINUX,
      cpuArchitecture: CpuArchitecture.ARM64,
    },
    taskRole: websocketServiceRole,
  },
);

This Task uses ARM64 compute. All components — the Dockerfile, the task definition, and the runtime platform — must agree on architecture.

Fargate Container

webSocketTask.addContainer('WebSocketContainer', {
  image: ContainerImage.fromAsset('src/constructs/resources/containerImage'),
  containerName: 'websocket-service',
  portMappings: [{ containerPort: 8080, hostPort: 8080 }],
  logging: LogDrivers.awsLogs({
    streamPrefix: 'websocket-service',
  }),
  healthCheck: {
    command: ['CMD-SHELL', 'curl -f http://localhost:8080/health'],
    interval: Duration.seconds(30),
    timeout: Duration.seconds(30),
  },
  environment: {},
});

Port 8080 is exposed on both the container and host. The container-level health check uses curl to hit the /health endpoint — this is why we install curl in the Dockerfile. This health check is distinct from the ALB health check we'll set up later; this one monitors the container itself, while the ALB check monitors the target from the load balancer's perspective.

Fargate Service

const websocketService = new FargateService(this, 'WebSocketService', {
  cluster: this.cluster,
  taskDefinition: webSocketTask,
  assignPublicIp: true,
  desiredCount: 1,
  vpcSubnets: { subnetType: SubnetType.PUBLIC },
  enableExecuteCommand: true,
});

The service runs in public subnets with a public IP assigned. ECS Exec is enabled so you can shell into running containers for debugging.

Auto Scaling

const scalableTarget = websocketService.autoScaleTaskCount({
  minCapacity: 1,
  maxCapacity: 5,
});

scalableTarget.scaleOnRequestCount('RequestScaling', {
  requestsPerTarget: 5,
  targetGroup: webSocketTargetGroup,
});

The Application Load Balancer tracks active WebSocket connections the same way it tracks HTTP requests. When active connections per target exceed 5 (set low here for demo purposes), the auto scaler spins up additional Fargate tasks. This is the key insight for WebSocket scaling — you're scaling on connection count, not CPU or memory.

Security Group

websocketService.connections.allowFrom(
  props.applicationLoadBalancer,
  Port.tcp(8080),
  'Allow traffic from ALB on port 8080',
);

Only the Application Load Balancer can reach the Fargate containers on port 8080. CDK's connections API handles the security group rule creation — no need to create and manage explicit security group resources.

Application Load Balancer

this.applicationLoadBalancer = new ApplicationLoadBalancer(
  this,
  'ApplicationLoadBalancer',
  {
    vpc: this.vpc,
    internetFacing: true,
    dropInvalidHeaderFields: true,
  },
);

The ALB is internet-facing and configured to drop invalid HTTP header fields — a security best practice that prevents header injection attacks.

Target Group

const webSocketTargetGroup = new ApplicationTargetGroup(
  this,
  'webSocketTargetGroup',
  {
    vpc: props.vpc,
    port: 8080,
    protocol: ApplicationProtocol.HTTP,
    targets: [websocketService],
    healthCheck: {
      path: '/',
      protocol: Protocol.HTTP,
      port: '8080',
    },
  },
);

The target group forwards traffic to port 8080 on the Fargate service. The ALB health check hits the root path — this is the load balancer verifying it can reach the container, separate from the container's own internal health check.

Listener

const webSocketListener = props.applicationLoadBalancer.addListener(
  'webSocketListener',
  {
    port: 80,
    protocol: ApplicationProtocol.HTTP,
    open: true,
    defaultAction: ListenerAction.fixedResponse(403),
  },
);

webSocketListener.addAction('ForwardFromCloudFront', {
  conditions: [
    ListenerCondition.httpHeader(props.customHeader, [props.randomString]),
  ],
  action: ListenerAction.forward([webSocketTargetGroup]),
  priority: 1,
});

The listener accepts HTTP on port 80 with a default action of 403 Forbidden. The only way traffic gets forwarded to the target group is if the request includes a specific custom header with a matching value. This header comes from CloudFront — restricting access to the ALB so that direct requests to the ALB are rejected.

The ALB uses HTTP, not HTTPS. TLS is enforced at the CloudFront layer. Since we only allow traffic through CloudFront, the client-facing connection is always encrypted without needing a custom domain or certificate on the ALB.

CloudFront Distribution

const defaultOrigin = new LoadBalancerV2Origin(props.applicationLoadBalancer, {
  httpPort: 80,
  protocolPolicy: OriginProtocolPolicy.HTTP_ONLY,
  originId: 'defaultOrigin',
  customHeaders: {
    [props.customHeader]: props.randomString,
  },
});

this.distribution = new Distribution(this, 'Distribution', {
  defaultBehavior: {
    origin: defaultOrigin,
    viewerProtocolPolicy: ViewerProtocolPolicy.HTTPS_ONLY,
    cachePolicy: CachePolicy.CACHING_DISABLED,
    allowedMethods: AllowedMethods.ALLOW_ALL,
    originRequestPolicy: OriginRequestPolicy.ALL_VIEWER,
  },
  defaultRootObject: 'index.html',
  priceClass: PriceClass.PRICE_CLASS_100,
  logBucket: distributionLoggingBucket,
  enableLogging: true,
  minimumProtocolVersion: SecurityPolicyProtocol.TLS_V1_2_2021,
});

The distribution enforces HTTPS on the viewer side and injects the custom header when forwarding to the ALB origin. Key configuration:

  • CachePolicy.CACHING_DISABLED — WebSockets are real-time; caching would break everything
  • AllowedMethods.ALLOW_ALL — WebSocket upgrades require more than just GET/HEAD
  • OriginRequestPolicy.ALL_VIEWER — forwards all headers, including the Upgrade: websocket header
  • customHeaders — the secret header/value pair that the ALB checks before forwarding traffic
  • SecurityPolicyProtocol.TLS_V1_2_2021 — enforces modern TLS

The customHeaders property on LoadBalancerV2Origin is the native CDK way to inject origin headers. CloudFront adds this header to every request it sends to the ALB, and the ALB's listener condition checks for it — completing the security chain.

The WebSocket Server

import * as http from 'http';
import express, { Response } from 'express';
import { Server as WebSocketServer, WebSocket } from 'ws';

const serverPort: number = 8080;
const app = express();
const server = http.createServer(app);
const websocketServer = new WebSocketServer({ server, path: '/wss' });

websocketServer.on('connection', (webSocketClient: WebSocket) => {
  console.log('New connection');
  webSocketClient.send(JSON.stringify({ connection: 'ok' }));

  webSocketClient.on('message', (message: string) => {
    console.log('New message');
    try {
      const parsedMessage = JSON.parse(message.toString());
      websocketServer.clients.forEach((client: WebSocket) => {
        if (client.readyState === WebSocket.OPEN) {
          client.send(JSON.stringify({ message: parsedMessage }));
        }
      });
    } catch (e) {
      console.error('Invalid JSON message format received');
    }
  });
});

app.get('/health', (_, res: Response) => {
  res.status(200).send('Ok');
});

app.get('/', (_, res: Response) => {
  res.status(200).send('Ok');
});

The server runs Express and the ws library on the same HTTP server. Express handles the health check endpoints (/health and /), while WebSocket connections go through the /wss path. Messages are broadcast to all connected clients.

Graceful Shutdown

const shutdown = () => {
  console.log('Shutting down gracefully...');

  websocketServer.clients.forEach((client: WebSocket) => {
    if (client.readyState === WebSocket.OPEN) {
      client.close(1001, 'Server shutting down');
    }
  });

  server.close(() => {
    console.log('Closed out remaining HTTP connections.');
    process.exit(0);
  });

  setTimeout(() => {
    console.error('Could not close connections in time, forcefully shutting down');
    process.exit(1);
  }, 10000);
};

process.on('SIGTERM', shutdown);
process.on('SIGINT', shutdown);

This is critical for WebSocket servers. When ECS scales down or deploys a new version, the task receives SIGTERM. Without graceful shutdown handling, all active WebSocket connections would drop immediately. This handler notifies each connected client with a 1001 close code, gives the HTTP server time to drain, and force-exits after 10 seconds if something hangs.

Why WebSockets Are Different

If you're used to deploying REST APIs or static web applications, WebSockets require different architectural thinking:

  • Persistent connections — HTTP requests are ephemeral. WebSocket connections stay open. You can't scale purely on CPU or memory; connection count drives capacity planning.
  • Graceful shutdowns are mandatory — Killing a REST API task mid-request is mostly harmless. Killing a WebSocket task drops every active connection instantly. The SIGTERM handler above is not optional.
  • ALB handles the upgrade — The Application Load Balancer natively routes the Upgrade: websocket header. It converts the initial HTTP connection into a persistent TCP stream to the ECS task.
  • Caching must be disabled — WebSockets are real-time bidirectional channels. CloudFront must pass everything through without caching, and forward all viewer headers so the upgrade handshake works.
  • Health checks need Express — The ws library doesn't serve HTTP. We attach Express to the same server exclusively so the ALB target group health checks get a 200 OK.

Security Testing with cdk-nag

The project includes cdk-nag for automated security checks:

test('No unsuppressed Errors', () => {
  const app = new App();
  const stack = new WebSocketServer(app, 'test', {});

  Aspects.of(stack).add(new AwsSolutionsChecks());

  const errors = Annotations.fromStack(stack).findError(
    '*',
    Match.stringLikeRegexp('AwsSolutions-.*'),
  );

  expect(errors).toHaveLength(0);
});

This runs the AWS Solutions rule pack against the synthesized stack. Any security findings that aren't explicitly suppressed will fail the test. Suppressions are documented in the stack with reasons — making security trade-offs visible and auditable.

Deployment

npm install
npm run deploy

Testing

Once deployed, CDK outputs the WebSocketServer.websocketUrl. Connect with wscat:

npx wscat -c wss://<YOUR-CLOUDFRONT-DOMAIN>.cloudfront.net/wss

Send any JSON payload to test the echo:

{"test": true}

Testing the Auto Scaler

The repo includes a load test script that rapidly opens and closes WebSocket connections:

npm run load-test wss://<YOUR-CLOUDFRONT-DOMAIN>.cloudfront.net/wss

Because the ALB tracks WebSocket active connections as requests, sustained load will trigger the CloudWatch alarm backing the auto scaler. After roughly 3 minutes, ECS will start provisioning additional Fargate tasks. The requestsPerTarget is set to 5 for demo purposes — increase this for production workloads based on your instance memory.

Removal

npm run destroy

Repo

https://github.com/schuettc/cdk-websocket-server