ServerlessAWSBackend

Scaling Serverless APIs: Lessons from 10M Monthly Requests

Practical techniques and patterns for building serverless APIs that scale gracefully to handle millions of requests.

5 min read
A
AnhDojo
Scaling Serverless APIs: Lessons from 10M Monthly Requests

Scaling Serverless APIs: Lessons from 10M Monthly Requests

When I first built a serverless API using AWS Lambda and API Gateway, I was amazed by how quickly I could deploy a working endpoint. But as my user base grew from hundreds to thousands to millions, I encountered scaling challenges that required rethinking my approach. Here’s what I learned while scaling a serverless API to handle over 10 million monthly requests.

Initial Architecture

My application started with a simple architecture:

  1. API Gateway as the entry point
  2. Lambda functions for business logic
  3. DynamoDB for data storage
  4. S3 for file storage

This worked beautifully until we hit about 1 million monthly requests.

Challenge 1: Cold Starts

Problem: As traffic increased, users occasionally experienced delays of 1-2 seconds when their requests hit a “cold” Lambda function.

Solution: I implemented a combination of strategies:

  1. Provisioned concurrency: For critical endpoints, I configured provisioned concurrency to keep functions warm at all times.

  2. Optimized runtime: Switching from Node.js with a heavy framework to lightweight functions significantly reduced cold start times.

  3. Streamlined dependencies: I audited dependencies ruthlessly, removing anything non-essential and using tools like esbuild to minimize bundle sizes.

Code example:

// Before: Heavy dependencies
import * as lodash from "lodash"; // Imports the entire library
import { parse } from "date-fns";
import * as AWS from "aws-sdk"; // Imports the entire SDK
// After: Optimized imports
import pick from "lodash/pick"; // Imports only what's needed
import { parseISO } from "date-fns/fp"; // Smaller functional version
import { DynamoDB } from "@aws-sdk/client-dynamodb"; // Only import specific service

Challenge 2: Database Scaling

Problem: As data volume grew, DynamoDB began throttling requests during peak traffic.

Solution: I refined our data access patterns with these techniques:

  1. On-demand capacity: Switched from provisioned to on-demand capacity to handle unpredictable traffic spikes.

  2. Caching strategy: Implemented a multi-level caching approach:

    • In-memory cache within Lambda functions for hot data
    • DAX (DynamoDB Accelerator) for frequently accessed items
    • ElastiCache for shared caching needs
  3. Data partitioning: Redesigned partition keys to distribute data more evenly and avoid hot partitions.

Before:

// Problematic schema with potential hot partition
const params = {
TableName: "Users",
Key: {
userId: "123", // Many operations on the same user create a hot key
},
};

After:

// Improved schema with composite keys for better distribution
const params = {
TableName: "UserActions",
Key: {
userId: "123",
actionId: `${timestamp}#${uuid()}`, // Ensures even distribution
},
};

Challenge 3: Cost Optimization

Problem: As scale increased, costs grew faster than expected, particularly with API Gateway and Lambda.

Solution: I implemented these optimizations:

  1. Batch processing: Instead of processing events one by one, I implemented batching where appropriate.

  2. Lambda power tuning: Used the AWS Lambda Power Tuning tool to find the optimal memory/CPU configuration for each function, sometimes finding that higher memory settings actually reduced costs by completing faster.

  3. API Gateway caching: Enabled caching at the API Gateway level for frequently accessed, relatively static data.

  4. GraphQL consolidation: Replaced multiple REST endpoints with a single GraphQL endpoint, reducing the total number of Lambda invocations.

Challenge 4: Monitoring and Observability

Problem: As complexity increased, it became difficult to identify performance bottlenecks and errors.

Solution: I built a comprehensive observability system:

  1. Structured logging: Standardized logging format across all functions with correlation IDs to track requests.

  2. Custom metrics: Created custom CloudWatch metrics for business-level monitoring.

  3. Tracing: Implemented AWS X-Ray tracing to identify latency issues across services.

  4. Alerting: Set up automated alerting based on error rates and latency thresholds.

Example structured logging:

const logger = (correlationId: string) => ({
info: (message: string, data?: Record<string, unknown>) =>
console.log(
JSON.stringify({
level: "INFO",
timestamp: new Date().toISOString(),
correlationId,
message,
...data,
})
),
error: (message: string, error?: Error, data?: Record<string, unknown>) =>
console.error(
JSON.stringify({
level: "ERROR",
timestamp: new Date().toISOString(),
correlationId,
message,
errorName: error?.name,
errorMessage: error?.message,
stackTrace: error?.stack,
...data,
})
),
});
// Usage in Lambda handler
export const handler = async (event: APIGatewayProxyEvent) => {
const correlationId = event.headers["x-correlation-id"] || uuidv4();
const log = logger(correlationId);
// ...
};

Conclusion

Scaling serverless applications requires a shift in mindset from traditional server-based architectures. By understanding the nuances of cold starts, database access patterns, and cost structures, you can build highly scalable systems that handle millions of requests without breaking the bank.

What challenges have you faced when scaling serverless apps? Let me know in the comments!

AD

Written by Anh Dojo

Backend developer passionate about building scalable systems and sharing knowledge with the community.