Designing REST APIs That Don't Break at Scale

I've designed REST APIs that power Android, iOS, and web clients — some serving 100K+ daily active users. I've also inherited APIs that should never have reached production, and spent months fixing what a week of proper design would have prevented.

These are the patterns that actually matter at scale, drawn from EmpSuite, Nova Cabs, and half a dozen other production systems.

📖 What You'll Learn

Versioning strategies that don't break clients, rate limiting that doesn't annoy legitimate users, pagination that handles 10M rows, and documentation habits that eliminate integration meetings.

Versioning Strategy That Actually Works

URL versioning (/api/v1/users) wins for REST APIs because it's explicit, cacheable, and doesn't require header inspection. Header versioning is cleaner architecturally but confuses client developers and breaks CDN caching.

My rule: never remove a field, only add. Breaking changes require a new version. Non-breaking additions go in the existing version. We ran v1 and v2 of the EmpSuite API side-by-side for 8 months during client migration — zero downtime.

Rate Limiting That Doesn't Annoy Legitimate Users

Flat rate limits are blunt and frustrating. Token bucket algorithms with burst allowances are better — they let legitimate clients handle traffic spikes without hitting limits.

const rateLimit = require('express-rate-limit');
const apiLimiter = rateLimit({
  windowMs: 60 * 1000,
  max: 100,
  standardHeaders: true,
  skip: (req) => req.user?.tier === 'enterprise',
  handler: (req, res) => res.status(429).json({
    error: 'rate_limit_exceeded',
    retry_after: Math.ceil(req.rateLimit.resetTime / 1000)
  })
});

Pagination Patterns for Large Datasets

Offset pagination (?page=2&limit=20) is simple but breaks at scale — a LIMIT 20 OFFSET 10000 SQL query scans 10,020 rows. For large datasets, cursor-based pagination is the correct choice.

Return an opaque next_cursor token (a base64-encoded timestamp or ID), decode it on the server, and use a WHERE created_at < :cursor indexed query. This is O(1) regardless of dataset size. EmpSuite's 5M-row attendance table went from 8-second API responses to 40ms with this change.

Documentation Habits That Eliminate Integration Meetings

Every API endpoint I ship has: a request/response example in the exact format clients will use, a list of all possible error codes with descriptions, and the authentication requirements. I use OpenAPI 3.0 spec files committed alongside the code — not a separate wiki that rots.

When an Android developer on my team can open Swagger UI and test an endpoint without asking me anything, the API is documented properly. Until then, it isn't.