In this demo, we build a WebSocket server using Amazon Elastic Container Service with Fargate, fronted by an Application Load Balancer and Amazon CloudFront Distribution for TLS termination. The server auto-scales based on connection count, and traffic between CloudFront and the ALB is secured with a custom header — no domain or certificate required.
flowchart LR
Client((Client))
CF[CloudFront<br/>Distribution]
subgraph VPC
ALB[Application<br/>Load Balancer]
subgraph ASG[Auto Scaling Group]
Node[Node Server]
end
end
Client -->|WSS| CF
CF -->|WS| ALB
ALB --> Node
style CF fill:#8C4FFF,color:#fff
style ALB fill:#8C4FFF,color:#fff
style Node fill:#FF9900,color:#fff
style VPC fill:#E9F3E6,stroke:#248814,stroke-width:2px
style ASG fill:#FFE5CC,stroke:#FF9900,stroke-width:2px,stroke-dasharray: 5 5
ECS Fargate
Docker Build
FROM --platform=linux/arm64 node:20-alpine AS build
WORKDIR /usr/src/app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
FROM --platform=linux/arm64 node:20-alpine
ENV NODE_ENV=production
RUN apk add --no-cache curl
USER node
WORKDIR /usr/src/app
COPY --chown=node:node package*.json ./
COPY --from=build --chown=node:node /usr/src/app/node_modules ./node_modules
COPY --from=build --chown=node:node /usr/src/app/dist ./dist
EXPOSE 8080
CMD [ "node", "dist/server.js" ]
This is a multi-stage build on node:20-alpine. The first stage installs dependencies and compiles TypeScript. The second stage copies only what's needed to run — no dev dependencies, no source files. The container runs as the node user rather than root, and curl is included for the health check.
Cluster
this.cluster = new Cluster(this, 'Cluster', {
vpc: props.vpc,
clusterName: 'websocket-service',
containerInsightsV2: ContainerInsights.ENHANCED,
});
Because we are using Fargate (serverless containers) instead of EC2 for our compute infrastructure, we don't need an EC2 Auto Scaling Group. The cluster automatically provisions underlying capacity based on the Task Definitions we define for it. Container Insights is enabled in enhanced mode for better observability out of the box.
Fargate Task Definition
const webSocketTask = new FargateTaskDefinition(
this,
'WebSocketTaskDefinition',
{
memoryLimitMiB: 2048,
cpu: 1024,
runtimePlatform: {
operatingSystemFamily: OperatingSystemFamily.LINUX,
cpuArchitecture: CpuArchitecture.ARM64,
},
taskRole: websocketServiceRole,
},
);
This Task uses ARM64 compute. All components — the Dockerfile, the task definition, and the runtime platform — must agree on architecture.
Fargate Container
webSocketTask.addContainer('WebSocketContainer', {
image: ContainerImage.fromAsset('src/constructs/resources/containerImage'),
containerName: 'websocket-service',
portMappings: [{ containerPort: 8080, hostPort: 8080 }],
logging: LogDrivers.awsLogs({
streamPrefix: 'websocket-service',
}),
healthCheck: {
command: ['CMD-SHELL', 'curl -f http://localhost:8080/health'],
interval: Duration.seconds(30),
timeout: Duration.seconds(30),
},
environment: {},
});
Port 8080 is exposed on both the container and host. The container-level health check uses curl to hit the /health endpoint — this is why we install curl in the Dockerfile. This health check is distinct from the ALB health check we'll set up later; this one monitors the container itself, while the ALB check monitors the target from the load balancer's perspective.
Fargate Service
const websocketService = new FargateService(this, 'WebSocketService', {
cluster: this.cluster,
taskDefinition: webSocketTask,
assignPublicIp: true,
desiredCount: 1,
vpcSubnets: { subnetType: SubnetType.PUBLIC },
enableExecuteCommand: true,
});
The service runs in public subnets with a public IP assigned. ECS Exec is enabled so you can shell into running containers for debugging.
Auto Scaling
const scalableTarget = websocketService.autoScaleTaskCount({
minCapacity: 1,
maxCapacity: 5,
});
scalableTarget.scaleOnRequestCount('RequestScaling', {
requestsPerTarget: 5,
targetGroup: webSocketTargetGroup,
});
The Application Load Balancer tracks active WebSocket connections the same way it tracks HTTP requests. When active connections per target exceed 5 (set low here for demo purposes), the auto scaler spins up additional Fargate tasks. This is the key insight for WebSocket scaling — you're scaling on connection count, not CPU or memory.
Security Group
websocketService.connections.allowFrom(
props.applicationLoadBalancer,
Port.tcp(8080),
'Allow traffic from ALB on port 8080',
);
Only the Application Load Balancer can reach the Fargate containers on port 8080. CDK's connections API handles the security group rule creation — no need to create and manage explicit security group resources.
Application Load Balancer
this.applicationLoadBalancer = new ApplicationLoadBalancer(
this,
'ApplicationLoadBalancer',
{
vpc: this.vpc,
internetFacing: true,
dropInvalidHeaderFields: true,
},
);
The ALB is internet-facing and configured to drop invalid HTTP header fields — a security best practice that prevents header injection attacks.
Target Group
const webSocketTargetGroup = new ApplicationTargetGroup(
this,
'webSocketTargetGroup',
{
vpc: props.vpc,
port: 8080,
protocol: ApplicationProtocol.HTTP,
targets: [websocketService],
healthCheck: {
path: '/',
protocol: Protocol.HTTP,
port: '8080',
},
},
);
The target group forwards traffic to port 8080 on the Fargate service. The ALB health check hits the root path — this is the load balancer verifying it can reach the container, separate from the container's own internal health check.
Listener
const webSocketListener = props.applicationLoadBalancer.addListener(
'webSocketListener',
{
port: 80,
protocol: ApplicationProtocol.HTTP,
open: true,
defaultAction: ListenerAction.fixedResponse(403),
},
);
webSocketListener.addAction('ForwardFromCloudFront', {
conditions: [
ListenerCondition.httpHeader(props.customHeader, [props.randomString]),
],
action: ListenerAction.forward([webSocketTargetGroup]),
priority: 1,
});
The listener accepts HTTP on port 80 with a default action of 403 Forbidden. The only way traffic gets forwarded to the target group is if the request includes a specific custom header with a matching value. This header comes from CloudFront — restricting access to the ALB so that direct requests to the ALB are rejected.
The ALB uses HTTP, not HTTPS. TLS is enforced at the CloudFront layer. Since we only allow traffic through CloudFront, the client-facing connection is always encrypted without needing a custom domain or certificate on the ALB.
CloudFront Distribution
const defaultOrigin = new LoadBalancerV2Origin(props.applicationLoadBalancer, {
httpPort: 80,
protocolPolicy: OriginProtocolPolicy.HTTP_ONLY,
originId: 'defaultOrigin',
customHeaders: {
[props.customHeader]: props.randomString,
},
});
this.distribution = new Distribution(this, 'Distribution', {
defaultBehavior: {
origin: defaultOrigin,
viewerProtocolPolicy: ViewerProtocolPolicy.HTTPS_ONLY,
cachePolicy: CachePolicy.CACHING_DISABLED,
allowedMethods: AllowedMethods.ALLOW_ALL,
originRequestPolicy: OriginRequestPolicy.ALL_VIEWER,
},
defaultRootObject: 'index.html',
priceClass: PriceClass.PRICE_CLASS_100,
logBucket: distributionLoggingBucket,
enableLogging: true,
minimumProtocolVersion: SecurityPolicyProtocol.TLS_V1_2_2021,
});
The distribution enforces HTTPS on the viewer side and injects the custom header when forwarding to the ALB origin. Key configuration:
CachePolicy.CACHING_DISABLED— WebSockets are real-time; caching would break everythingAllowedMethods.ALLOW_ALL— WebSocket upgrades require more than just GET/HEADOriginRequestPolicy.ALL_VIEWER— forwards all headers, including theUpgrade: websocketheadercustomHeaders— the secret header/value pair that the ALB checks before forwarding trafficSecurityPolicyProtocol.TLS_V1_2_2021— enforces modern TLS
The customHeaders property on LoadBalancerV2Origin is the native CDK way to inject origin headers. CloudFront adds this header to every request it sends to the ALB, and the ALB's listener condition checks for it — completing the security chain.
The WebSocket Server
import * as http from 'http';
import express, { Response } from 'express';
import { Server as WebSocketServer, WebSocket } from 'ws';
const serverPort: number = 8080;
const app = express();
const server = http.createServer(app);
const websocketServer = new WebSocketServer({ server, path: '/wss' });
websocketServer.on('connection', (webSocketClient: WebSocket) => {
console.log('New connection');
webSocketClient.send(JSON.stringify({ connection: 'ok' }));
webSocketClient.on('message', (message: string) => {
console.log('New message');
try {
const parsedMessage = JSON.parse(message.toString());
websocketServer.clients.forEach((client: WebSocket) => {
if (client.readyState === WebSocket.OPEN) {
client.send(JSON.stringify({ message: parsedMessage }));
}
});
} catch (e) {
console.error('Invalid JSON message format received');
}
});
});
app.get('/health', (_, res: Response) => {
res.status(200).send('Ok');
});
app.get('/', (_, res: Response) => {
res.status(200).send('Ok');
});
The server runs Express and the ws library on the same HTTP server. Express handles the health check endpoints (/health and /), while WebSocket connections go through the /wss path. Messages are broadcast to all connected clients.
Graceful Shutdown
const shutdown = () => {
console.log('Shutting down gracefully...');
websocketServer.clients.forEach((client: WebSocket) => {
if (client.readyState === WebSocket.OPEN) {
client.close(1001, 'Server shutting down');
}
});
server.close(() => {
console.log('Closed out remaining HTTP connections.');
process.exit(0);
});
setTimeout(() => {
console.error('Could not close connections in time, forcefully shutting down');
process.exit(1);
}, 10000);
};
process.on('SIGTERM', shutdown);
process.on('SIGINT', shutdown);
This is critical for WebSocket servers. When ECS scales down or deploys a new version, the task receives SIGTERM. Without graceful shutdown handling, all active WebSocket connections would drop immediately. This handler notifies each connected client with a 1001 close code, gives the HTTP server time to drain, and force-exits after 10 seconds if something hangs.
Why WebSockets Are Different
If you're used to deploying REST APIs or static web applications, WebSockets require different architectural thinking:
- Persistent connections — HTTP requests are ephemeral. WebSocket connections stay open. You can't scale purely on CPU or memory; connection count drives capacity planning.
- Graceful shutdowns are mandatory — Killing a REST API task mid-request is mostly harmless. Killing a WebSocket task drops every active connection instantly. The
SIGTERMhandler above is not optional. - ALB handles the upgrade — The Application Load Balancer natively routes the
Upgrade: websocketheader. It converts the initial HTTP connection into a persistent TCP stream to the ECS task. - Caching must be disabled — WebSockets are real-time bidirectional channels. CloudFront must pass everything through without caching, and forward all viewer headers so the upgrade handshake works.
- Health checks need Express — The
wslibrary doesn't serve HTTP. We attach Express to the same server exclusively so the ALB target group health checks get a200 OK.
Security Testing with cdk-nag
The project includes cdk-nag for automated security checks:
test('No unsuppressed Errors', () => {
const app = new App();
const stack = new WebSocketServer(app, 'test', {});
Aspects.of(stack).add(new AwsSolutionsChecks());
const errors = Annotations.fromStack(stack).findError(
'*',
Match.stringLikeRegexp('AwsSolutions-.*'),
);
expect(errors).toHaveLength(0);
});
This runs the AWS Solutions rule pack against the synthesized stack. Any security findings that aren't explicitly suppressed will fail the test. Suppressions are documented in the stack with reasons — making security trade-offs visible and auditable.
Deployment
npm install
npm run deploy
Testing
Once deployed, CDK outputs the WebSocketServer.websocketUrl. Connect with wscat:
npx wscat -c wss://<YOUR-CLOUDFRONT-DOMAIN>.cloudfront.net/wss
Send any JSON payload to test the echo:
{"test": true}
Testing the Auto Scaler
The repo includes a load test script that rapidly opens and closes WebSocket connections:
npm run load-test wss://<YOUR-CLOUDFRONT-DOMAIN>.cloudfront.net/wss
Because the ALB tracks WebSocket active connections as requests, sustained load will trigger the CloudWatch alarm backing the auto scaler. After roughly 3 minutes, ECS will start provisioning additional Fargate tasks. The requestsPerTarget is set to 5 for demo purposes — increase this for production workloads based on your instance memory.
Removal
npm run destroy