Automating Qualtrics Survey Data Downloads with Lambda

Survey platforms like Qualtrics provide powerful tools for collecting responses, but accessing that data programmatically for analysis or real-time processing often requires building custom integrations. When you need to automatically download survey data, process it, and take actions based on the responses, a serverless approach using AWS Lambda provides an efficient and scalable solution.

This post explores how to build a Lambda function that automatically downloads survey response data from Qualtrics using their export API. We'll cover the three-step export process, secure credential management, and techniques for processing the downloaded CSV data.

The Challenge: Accessing Qualtrics Data Programmatically

Qualtrics offers a comprehensive REST API, but downloading survey responses isn't as straightforward as a simple GET request. The platform uses an asynchronous export process designed to handle large datasets without timing out. This means you need to orchestrate multiple API calls: initiating the export, polling for completion, and finally downloading the generated file.

Additionally, real-world applications often need to process this data immediately upon download, extracting specific information and storing results for other systems to consume. Our Lambda function addresses both challenges by automating the entire workflow from data export to processing.

Understanding the Qualtrics Export Process

Qualtrics uses a three-step asynchronous process for exporting survey data that prevents timeouts when dealing with large response datasets.

Step 1: Initiate the Export

The first step creates an export request on Qualtrics' servers. You specify the survey ID and the desired format (CSV in our case), and Qualtrics returns a progress ID that you'll use to track the export status.

export_url = f"{BASE_URL}/surveys/{SURVEY_ID}/export-responses"
export_payload = json.dumps({"format": "csv"})
export_response = http.request("POST", export_url, body=export_payload, headers=headers)
 
if export_response.status != 200:
    raise Exception(f"Failed to initiate export: {export_response.data}")
 
response_data = json.loads(export_response.data.decode('utf-8'))
progress_id = response_data["result"]["progressId"]

Step 2: Poll for Completion

Once the export is initiated, you need to poll the progress endpoint until Qualtrics indicates the export is complete. The polling approach prevents your Lambda from timing out while waiting for large exports to finish.

progress_url = f"{BASE_url}/surveys/{SURVEY_ID}/export-responses/{progress_id}"
 
while True:
    progress_response = http.request("GET", progress_url, headers=headers)
 
    if progress_response.status != 200:
        raise Exception(f"Failed to check progress: {progress_response.data}")
 
    progress_data = json.loads(progress_response.data.decode('utf-8'))
 
    if progress_data["result"]["status"] == "complete":
        file_id = progress_data["result"]["fileId"]
        break
    elif progress_data["result"]["status"] == "failed":
        raise Exception("Export failed on Qualtrics side")
 
    # Wait before polling again to avoid rate limiting
    time.sleep(5)

Step 3: Download the File

After the export completes, you can download the actual CSV file using the file ID returned in the previous step.

download_url = f"{BASE_URL}/surveys/{SURVEY_ID}/export-responses/{file_id}/file"
download_response = http.request("GET", download_url, headers=headers)
 
if download_response.status != 200:
    raise Exception(f"Failed to download file: {download_response.data}")
 
# The response contains the ZIP file with the CSV data
zip_data = download_response.data

Authentication and Secrets Management

Qualtrics API authentication uses API tokens that should never be hardcoded in your Lambda function. AWS Secrets Manager provides a secure way to store and retrieve these credentials.

Setting Up Secrets Manager

First, create a secret in AWS Secrets Manager containing your Qualtrics API key:

{
  "api_key": "your-qualtrics-api-token"
}

Retrieving Secrets in Lambda

Your Lambda function can securely retrieve the API key using the AWS Secrets Manager client:

import boto3
import json
import os
 
def get_secret():
    secret_name = os.environ.get("SECRET_NAME")
    region_name = os.environ.get("AWS_REGION")
 
    session = boto3.session.Session()
    client = session.client(
        service_name="secretsmanager",
        region_name=region_name
    )
 
    try:
        get_secret_value_response = client.get_secret_value(SecretId=secret_name)
        secret = get_secret_value_response["SecretString"]
        return json.loads(secret)["api_key"]
    except Exception as e:
        print(f"Error retrieving secret: {e}")
        raise

Building Request Headers

With the API key retrieved, you can construct the authentication headers required for Qualtrics API calls:

def get_headers():
    api_key = get_secret()
    return {
        "X-API-TOKEN": api_key,
        "Content-Type": "application/json"
    }

CSV Processing and Data Extraction

Once you've downloaded the ZIP file containing the CSV data, the next step is extracting and processing the survey responses. The specific processing logic depends on your survey structure and data requirements.

Extracting CSV from ZIP

Qualtrics exports are delivered as ZIP files containing the CSV data:

import zipfile
import io
import csv
 
def extract_csv_from_zip(zip_data, survey_id):
    with zipfile.ZipFile(io.BytesIO(zip_data), 'r') as zip_file:
        # Find the CSV file in the ZIP
        csv_filename = None
        for filename in zip_file.namelist():
            if filename.endswith('.csv'):
                csv_filename = filename
                break
 
        if not csv_filename:
            raise Exception("No CSV file found in the ZIP")
 
        # Extract and return CSV content
        with zip_file.open(csv_filename) as csv_file:
            csv_content = csv_file.read().decode('utf-8')
            return csv_content

Processing Survey Responses

The structure of your survey responses will vary based on your Qualtrics survey design. Here's an example of processing responses to extract specific data points:

def process_csv_data(csv_content):
    csv_reader = csv.DictReader(io.StringIO(csv_content))
    processed_data = []
 
    # Skip header rows (Qualtrics typically includes 3 header rows)
    rows = list(csv_reader)
    data_rows = rows[2:]  # Skip first 2 rows of headers
 
    for row in data_rows:
        # Extract relevant fields based on your survey structure
        response_data = {
            'response_id': row.get('ResponseId', ''),
            'start_date': row.get('StartDate', ''),
            'end_date': row.get('EndDate', ''),
            # Add other fields as needed
        }
 
        # Apply any data transformations or calculations
        processed_data.append(response_data)
 
    return processed_data

Handling Complex Survey Logic

Many surveys include complex logic with conditional questions, multiple response types, and calculated fields. Here's an example of processing responses with business logic:

def calculate_totals_from_responses(csv_content):
    csv_reader = csv.reader(io.StringIO(csv_content))
    rows = list(csv_reader)
 
    # Skip header rows
    data_rows = rows[3:]
 
    totals = {"category_a": 0, "category_b": 0}
 
    for row in data_rows:
        if len(row) < 10:  # Ensure row has enough columns
            continue
 
        # Process different question types based on column positions
        # This logic would be specific to your survey structure
        try:
            # Example: Look for responses in specific columns
            if row[94]:  # Question column 94
                category_selection = int(row[94])
                amount = float(row[95]) if row[95] else 0
 
                if category_selection == 1:
                    totals["category_a"] += amount
                elif category_selection == 2:
                    totals["category_b"] += amount
 
        except (ValueError, IndexError):
            # Handle malformed data gracefully
            continue
 
    return totals

Lambda Configuration and Deployment

Deploying this solution requires configuring your Lambda function with the appropriate runtime, permissions, and environment variables.

Lambda Function Configuration

# Environment variables needed
ENVIRONMENT_VARIABLES = {
    'SECRET_NAME': 'your-secrets-manager-secret-name',
    'SURVEY_ID': 'your-qualtrics-survey-id',
    'BASE_URL': 'https://iad1.qualtrics.com/API/v3',
    'RESULT_BUCKET': 'your-s3-bucket-for-results'
}
 
# Required IAM permissions
REQUIRED_PERMISSIONS = [
    'secretsmanager:GetSecretValue',
    's3:PutObject',
    's3:GetObject',
    'logs:CreateLogGroup',
    'logs:CreateLogStream',
    'logs:PutLogEvents'
]

CDK Deployment Example

If you're using AWS CDK, here's how you might deploy this Lambda function:

import { Function, Runtime, Architecture } from "aws-cdk-lib/aws-lambda";
import { Duration } from "aws-cdk-lib";
 
const qualtricsLambda = new Function(this, "QualtricsDataProcessor", {
  runtime: Runtime.PYTHON_3_12,
  architecture: Architecture.ARM_64,
  handler: "index.handler",
  code: Code.fromAsset("lambda"),
  timeout: Duration.minutes(5),
  memorySize: 1024,
  environment: {
    SECRET_NAME: secret.secretName,
    SURVEY_ID: "SV_123456789",
    BASE_URL: "https://iad1.qualtrics.com/API/v3",
    RESULT_BUCKET: bucket.bucketName,
  },
});
 
// Grant permissions
secret.grantRead(qualtricsLambda);
bucket.grantWrite(qualtricsLambda);

Scheduling Regular Downloads

For automated data processing, you can schedule your Lambda to run at regular intervals using EventBridge:

import { Rule, Schedule } from "aws-cdk-lib/aws-events";
import { LambdaFunction } from "aws-cdk-lib/aws-events-targets";
 
const rule = new Rule(this, "QualtricsDataSchedule", {
  schedule: Schedule.rate(Duration.minutes(30)),
  description: "Download and process Qualtrics data every 30 minutes",
});
 
rule.addTarget(new LambdaFunction(qualtricsLambda));

Error Handling and Resilience

Production Lambda functions need robust error handling to deal with API failures, network issues, and malformed data:

def lambda_handler(event, context):
    try:
        # Download data from Qualtrics
        zip_data = download_qualtrics_data()
 
        # Process the CSV
        csv_content = extract_csv_from_zip(zip_data)
        processed_data = process_csv_data(csv_content)
 
        # Store results
        store_results(processed_data)
 
        return {
            'statusCode': 200,
            'body': json.dumps({'message': 'Data processed successfully'})
        }
 
    except Exception as e:
        print(f"Error processing Qualtrics data: {str(e)}")
 
        # Optionally send alerts or store error information
        return {
            'statusCode': 500,
            'body': json.dumps({'error': str(e)})
        }

Conclusion

Automating Qualtrics survey data downloads with Lambda provides a scalable solution for real-time data processing. The three-step export process handles large datasets efficiently, while AWS Secrets Manager ensures secure credential management. By processing the CSV data immediately upon download, you can trigger automated workflows, update dashboards, or integrate survey responses with other systems.

This approach is particularly valuable for applications that need to respond to survey data in near real-time, such as research platforms, feedback systems, or data collection workflows that trigger additional actions based on response patterns.