Using psycopg2 with AWS Lambda and CDK: Binary vs Source Solutions

If you've ever tried to connect to a PostgreSQL database from an AWS Lambda function using Python, you've probably encountered the dreaded ImportError: No module named 'psycopg2._psycopg' error. This error is frustrating, and finding a clean solution can be challenging. After experiencing this frustration myself, I found two reliable solutions - one for quick development and another for production workloads.

The Problem

The challenge with psycopg2 in Lambda stems from a fundamental compatibility issue. When you install psycopg2 on your local machine (especially on macOS), you're getting binaries compiled for your specific operating system and architecture. Lambda functions run on Amazon Linux, and those macOS binaries simply won't work there.

The traditional psycopg2 package requires PostgreSQL client libraries to be present on the system. These libraries aren't available in the Lambda runtime environment, which means even if you could get the package installed, it wouldn't run.

Common errors you might encounter include:

[ERROR] Runtime.ImportModuleError: Unable to import module 'index': No module named 'psycopg2._psycopg'

Or when trying to build from source:

Error: pg_config executable not found.
pg_config is required to build psycopg2 from source.

Solution 1: Using Pre-compiled Binaries (Quick & Simple)

For rapid development and simpler use cases, psycopg2-binary provides a quick solution. This approach uses pre-compiled binaries that include all necessary PostgreSQL client libraries.

Here's the CDK code that works:

new lambda.Function(this, "PostgresFunction", {
  runtime: lambda.Runtime.PYTHON_3_11,
  handler: "index.handler",
  architecture: lambda.Architecture.X86_64,
  code: lambda.Code.fromAsset(path.join(__dirname, "../lambdas/postgres"), {
    bundling: {
      image: lambda.Runtime.PYTHON_3_11.bundlingImage,
      command: [
        "bash",
        "-c",
        "pip install --target /asset-output --platform manylinux2014_x86_64 --only-binary=:all: psycopg2-binary && cp -r . /asset-output",
      ],
    },
  }),
  vpc: vpc,
  environment: {
    RDS_ENDPOINT: rdsEndpoint,
    RDS_SECRET_ARN: rdsSecretArn,
    RDS_DATABASE: "mydb",
  },
});

The key is in the pip install command:

--target /asset-output: Specifies where to install the packages
--platform manylinux2014_x86_64: Forces pip to download Linux-compatible wheels
--only-binary=:all:: Ensures pip only uses pre-compiled wheels
psycopg2-binary: The binary distribution with included PostgreSQL libraries

Why This Works

The psycopg2-binary package is self-contained with statically linked PostgreSQL client libraries. This makes it perfect for serverless environments where you can't install system dependencies.

Solution 2: Building from Source with Docker (Production-Ready)

While psycopg2-binary is convenient, the psycopg2 documentation explicitly recommends building from source for production use. This ensures optimal performance and compatibility. With Lambda container images, we can easily build psycopg2 from source.

The Dockerfile Approach

First, create a Dockerfile that builds psycopg2 from source:

# Use AWS Lambda Python runtime as base
FROM --platform=linux/amd64 public.ecr.aws/lambda/python:3.13
 
# Install PostgreSQL development packages and build dependencies
RUN dnf update -y && \
    dnf install -y \
    postgresql-devel \
    gcc \
    python3-devel \
    make \
    && dnf clean all
 
# Install psycopg2 from source (not psycopg2-binary)
RUN pip install --no-cache-dir psycopg2==2.9.10
 
# Copy lambda handler
COPY lambda_handler.py ${LAMBDA_TASK_ROOT}
 
# Set the handler
CMD ["lambda_handler.handler"]

Key points about this Dockerfile:

Platform specification: --platform=linux/amd64 ensures x86_64 architecture
Development packages: Installing postgresql-devel provides the pg_config and headers needed to build psycopg2
Build tools: gcc, python3-devel, and make are required for compilation
Clean installation: Using pip install psycopg2 (not psycopg2-binary) builds from source

CDK Implementation for Container Images

Here's how to deploy a container-based Lambda with CDK:

import * as lambda from "aws-cdk-lib/aws-lambda";
import * as path from "path";
 
// Create Lambda function from container image
const lambdaFunction = new lambda.DockerImageFunction(this, "Function", {
  code: lambda.DockerImageCode.fromImageAsset(
    path.join(__dirname, "../docker"),
  ),
  vpc,
  vpcSubnets: {
    subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS,
  },
  securityGroups: [lambdaSecurityGroup],
  timeout: cdk.Duration.seconds(30),
  memorySize: 512,
  environment: {
    DB_HOST: database.instanceEndpoint.hostname,
    DB_PORT: database.instanceEndpoint.port.toString(),
    DB_NAME: "testdb",
    DB_USER: "postgres",
    DB_PASSWORD:
      database.secret?.secretValueFromJson("password").unsafeUnwrap() || "",
  },
  architecture: lambda.Architecture.X86_64,
});

Lambda Handler for Both Approaches

Whether using binaries or building from source, the Lambda handler code remains similar:

import json
import os
import sys
import boto3
 
if os.environ.get('LAMBDA_TASK_ROOT'):
    CWD = os.path.dirname(os.path.realpath(__file__))
    sys.path.insert(0, CWD)
 
import psycopg2
from psycopg2.extras import RealDictCursor
 
def get_db_connection():
    """Retrieve database credentials and establish connection"""
    # For production, use Secrets Manager
    if os.environ.get('RDS_SECRET_ARN'):
        secrets_client = boto3.client('secretsmanager')
        secret_response = secrets_client.get_secret_value(
            SecretId=os.environ['RDS_SECRET_ARN']
        )
        secret = json.loads(secret_response['SecretString'])
        db_password = secret['password']
        db_user = secret['username']
    else:
        # For demos, use environment variables
        db_password = os.environ['DB_PASSWORD']
        db_user = os.environ['DB_USER']
 
    return psycopg2.connect(
        host=os.environ['DB_HOST'],
        port=os.environ.get('DB_PORT', '5432'),
        database=os.environ['DB_NAME'],
        user=db_user,
        password=db_password,
        cursor_factory=RealDictCursor,
        connect_timeout=5
    )
 
def handler(event, context):
    """Lambda handler function"""
    try:
        conn = get_db_connection()
        cursor = conn.cursor()
 
        # Get PostgreSQL version and psycopg2 info
        cursor.execute("SELECT version();")
        version = cursor.fetchone()
 
        cursor.execute("SELECT current_database(), current_user, now();")
        db_info = cursor.fetchone()
 
        result = {
            'success': True,
            'message': 'Successfully connected to PostgreSQL',
            'psycopg2_version': psycopg2.__version__,
            'postgresql_version': version['version'],
            'database_info': {
                'database': db_info['current_database'],
                'user': db_info['current_user'],
                'timestamp': str(db_info['now'])
            }
        }
 
        cursor.close()
        conn.close()
 
        return {
            'statusCode': 200,
            'headers': {'Content-Type': 'application/json'},
            'body': json.dumps(result, indent=2)
        }
    except Exception as e:
        print(f"Error: {str(e)}")
        return {
            'statusCode': 500,
            'headers': {'Content-Type': 'application/json'},
            'body': json.dumps({
                'success': False,
                'error': str(e),
                'error_type': type(e).__name__
            })
        }

Comparing the Approaches

When to Use Pre-compiled Binaries

Use psycopg2-binary when:

Rapid prototyping or development
Simple Lambda functions with basic database operations
Package size isn't a critical concern
You need the simplest deployment process

Advantages:

Quick to set up
Smaller CDK code footprint

When to Build from Source

Use Docker container images with source-built psycopg2 when:

Production workloads requiring optimal performance
You need full control over the build process
Using other compiled dependencies alongside psycopg2
Following psycopg2's official production recommendations

Advantages:

Optimized for the Lambda runtime environment
Follows psycopg2 best practices for production
More control over the build process
Can include additional system dependencies

Important Considerations for Both Approaches

Architecture Matching

Always ensure your Lambda function architecture matches your build target:

For x86_64 (most common):

architecture: lambda.Architecture.X86_64;

For ARM64 (Graviton2):

architecture: lambda.Architecture.ARM_64;
// Also update pip platform flag to: manylinux2014_aarch64
// Or Dockerfile platform to: --platform=linux/arm64

Troubleshooting Common Issues

"exec format error" with Container Images

This typically means architecture mismatch. Ensure your Dockerfile specifies the correct platform:

FROM --platform=linux/amd64 public.ecr.aws/lambda/python:3.13

Conclusion

Getting psycopg2 to work with AWS Lambda doesn't have to be a struggle. You now have two reliable approaches:

For quick development: Use psycopg2-binary with platform-specific pip flags
For production: Build from source using Lambda container images

Both solutions avoid the common pitfalls and provide reliable database connectivity. Choose based on your specific needs:

Need something working quickly? Use the binary approach.
Building for production? Take the extra step to build from source with Docker.

The key insight is that Lambda's environment requires special handling for compiled dependencies. Whether you use pre-compiled binaries or build from source, explicitly controlling the build process for the Lambda runtime is essential for success.

With these approaches, you can focus on building your application logic instead of fighting with package compatibility issues.