I hit Cloudflare Workers’ 30-second CPU time limit while processing just 10 users.
Each user took ~3 seconds to process (GitHub API calls + notifications). 10 users × 3 seconds = 30 seconds. Add any overhead and I’d get Time Limit Exceeded errors. The math was simple: I couldn’t scale sequentially.
That’s when I discovered Service Bindings-a feature that lets you spawn multiple Worker instances, each with its own fresh CPU budget. The result? I went from processing 10 users in 30+ seconds (with failures) to processing 1000+ users in parallel, all on Cloudflare’s free tier.
The Problem: CPU Time Limits Kill Sequential Processing
I was building Streaky, a GitHub streak reminder app. Every day at noon, it checks users’ GitHub contributions and sends notifications if they haven’t committed yet.
The workflow:
- Query active users from D1 database
- For each user:
- Fetch GitHub contributions via API (~1.5 seconds)
- Calculate current streak (~0.5 seconds)
- Send Discord/Telegram notification (~1 second)
- Log results to database
The constraint: Cloudflare Workers have a 30-second CPU time limit per request. With 10 users taking 3 seconds each, I was right at the edge. Any network latency or API slowdown would trigger TLE errors.
What I tried first:
// Sequential processing - DOESN'T SCALE
export default {
async scheduled(event, env, ctx) {
const users = await getActiveUsers(env);
for (const user of users) {
await processUser(env, user); // 3 seconds per user
}
// Total: 10 users × 3 seconds = 30 seconds (TLE!)
}
}
Why it failed:
- 10 users = 30 seconds (at the limit)
- 11 users = 33 seconds (TLE error)
- No room for growth
- Network latency pushes it over the edge
I needed a way to process users in parallel, not sequentially.
The Solution: Service Bindings + Distributed Queue
The core insight: instead of one Worker processing N users, spawn N Workers each processing 1 user.
Architecture:
Scheduler Worker (Main)
|
|-- Worker Instance 1 (User A) - Fresh 30s CPU budget
|-- Worker Instance 2 (User B) - Fresh 30s CPU budget
|-- Worker Instance 3 (User C) - Fresh 30s CPU budget
|-- ...
|-- Worker Instance N (User N) - Fresh 30s CPU budget
Result:
- 10 users processed in ~10 seconds (parallel)
- Each Worker uses <5 seconds CPU time
- No TLE errors
- Scales to 1000+ users
The key: Service Bindings allow a Worker to call itself, creating new Worker instances. Each env.SELF.fetch() spawns a fresh Worker with its own CPU budget.
The Architecture: Queue + Service Bindings
Component 1: Queue Table (D1 SQLite)
The queue tracks which users need processing and prevents duplicate work.
CREATE TABLE cron_queue (
id TEXT PRIMARY KEY,
user_id TEXT NOT NULL,
batch_id TEXT NOT NULL,
status TEXT NOT NULL CHECK(status IN ('pending', 'processing', 'completed', 'failed')),
created_at TEXT NOT NULL DEFAULT (datetime('now')),
started_at TEXT,
completed_at TEXT,
error_message TEXT,
retry_count INTEGER NOT NULL DEFAULT 0
);
CREATE INDEX idx_cron_queue_status ON cron_queue(status);
CREATE INDEX idx_cron_queue_batch ON cron_queue(batch_id);
Why D1?
- Already part of the stack (no external dependencies)
- Fast enough for job queues (< 10ms queries)
- Supports atomic operations (prevents race conditions)
- Free tier: 100,000 writes/day (plenty for this use case)
Component 2: Atomic Queue Claiming
The critical part: prevent race conditions when multiple Workers try to claim the same user.
export async function claimNextPendingUserAtomic(
env: Env
): Promise<QueueItem | null> {
const result = await env.DB.prepare(`
WITH next AS (
SELECT id FROM cron_queue
WHERE status = 'pending'
ORDER BY created_at ASC
LIMIT 1
)
UPDATE cron_queue
SET status = 'processing', started_at = datetime('now')
WHERE id IN (SELECT id FROM next)
RETURNING id, user_id, batch_id
`).all<QueueItem>();
return result.results[0] ?? null;
}
Why atomic?
- CTE (WITH) + UPDATE + RETURNING in single transaction
- No gap between SELECT and UPDATE
- D1 SQLite guarantees atomicity
- Prevents duplicate processing
Without atomic claiming:
Worker 1: SELECT id WHERE status='pending' → Gets user A
Worker 2: SELECT id WHERE status='pending' → Gets user A (race!)
Both workers process user A (duplicate notifications!)
With atomic claiming:
Worker 1: CTE + UPDATE + RETURNING → Gets user A, marks processing
Worker 2: CTE + UPDATE + RETURNING → Gets user B, marks processing
No duplicates, each worker gets unique user
Component 3: Service Bindings Configuration
Service Bindings let a Worker call itself, creating new instances.
wrangler.toml:
[[services]]
binding = "SELF"
service = "streaky"
Usage:
// Each fetch creates a NEW Worker instance
env.SELF.fetch('http://internal/api/cron/process-user', {
method: 'POST',
headers: {
'X-Cron-Secret': env.SERVER_SECRET,
'Content-Type': 'application/json',
},
body: JSON.stringify({
queueId: queueItem.id,
userId: queueItem.user_id,
}),
})
Why Service Bindings?
- Each
env.SELF.fetch()= new Worker instance - Fresh CPU budget per instance (30 seconds each)
- Automatic load balancing by Cloudflare
- No external queue service needed (Redis, SQS, etc.)
Implementation: Step-by-Step
Step 1: Initialize Batch
When the cron trigger fires, create a batch of queue items.
export async function initializeBatch(
env: Env,
userIds: string[]
): Promise<string> {
const batchId = crypto.randomUUID();
// Bulk insert users to queue
for (const userId of userIds) {
const queueId = crypto.randomUUID();
await env.DB.prepare(
`INSERT INTO cron_queue (id, user_id, batch_id, status)
VALUES (?, ?, ?, 'pending')`
)
.bind(queueId, userId, batchId)
.run();
}
return batchId;
}
Step 2: Scheduler (Main Worker)
The scheduler initializes the batch and dispatches Workers.
export default {
async scheduled(event: ScheduledEvent, env: Env, ctx: ExecutionContext) {
// Query active users
const usersResult = await env.DB.prepare(
`SELECT id FROM users WHERE is_active = 1 AND github_pat IS NOT NULL`
).all();
const userIds = usersResult.results.map((row: any) => row.id as string);
if (userIds.length === 0) {
console.log('[Scheduled] No active users to process');
return;
}
// Initialize batch
const batchId = await initializeBatch(env, userIds);
console.log(`[Scheduled] Batch ${batchId} initialized with ${userIds.length} users`);
// Dispatch Workers via Service Bindings
for (let i = 0; i < userIds.length; i++) {
const queueItem = await claimNextPendingUserAtomic(env);
if (!queueItem) break;
// Spawn new Worker instance for this user
ctx.waitUntil(
env.SELF.fetch('http://internal/api/cron/process-user', {
method: 'POST',
headers: {
'X-Cron-Secret': env.SERVER_SECRET,
'Content-Type': 'application/json',
},
body: JSON.stringify({
queueId: queueItem.id,
userId: queueItem.user_id,
}),
})
.then((res) => {
console.log(`[Scheduled] User ${queueItem.user_id} dispatched: ${res.status}`);
})
.catch((error: Error) => {
console.error(`[Scheduled] User ${queueItem.user_id} dispatch failed:`, error);
})
);
}
console.log(`[Scheduled] All ${userIds.length} users dispatched for batch ${batchId}`);
}
}
Key points:
ctx.waitUntil()ensures async operations complete- Each
env.SELF.fetch()creates new Worker instance - Errors in one Worker don’t affect others
Step 3: Worker Instance (Process Single User)
Each Worker instance processes one user.
app.post('/process-user', async (c) => {
// Auth check
const secret = c.req.header('X-Cron-Secret');
if (!c.env.SERVER_SECRET || secret !== c.env.SERVER_SECRET) {
return c.json({ error: 'Unauthorized' }, 401);
}
const body = await c.req.json<{ queueId: string; userId: string }>();
const { queueId, userId } = body;
// Idempotency check
const status = await getQueueItemStatus(c.env, queueId);
if (status === 'completed') {
return c.json({
success: true,
queueId,
userId,
skipped: true,
reason: 'Already completed'
});
}
// Process user
try {
await processSingleUser(c.env, userId);
await markCompleted(c.env, queueId);
return c.json({ success: true, queueId, userId });
} catch (error) {
const errorMessage = error instanceof Error ? error.message : 'Unknown error';
await markFailed(c.env, queueId, errorMessage);
// Return 200 (not 500) so scheduler continues with other users
return c.json({ success: false, queueId, userId, error: errorMessage });
}
});
Key points:
- Idempotency protection (check status before processing)
- Return 200 even on failure (don’t block other Workers)
- Mark completed/failed in queue
Show Me the Numbers
I’m skeptical by nature, so I needed concrete data.
Performance Comparison
| Approach | Users | Processing Time | CPU Time/Worker | Success Rate |
|---|---|---|---|---|
| Sequential | 10 | 30+ seconds | 30 seconds | 0% (TLE) |
| Distributed | 10 | ~10 seconds | 3 seconds | 100% |
| Distributed | 100 | ~15 seconds | 3 seconds | 100% |
| Distributed | 1000 | ~30 seconds | 3 seconds | 100% |
Source: Cloudflare Workers Analytics, October 2025
Real-World Impact
Before (Sequential):
- 10 users × 3 seconds = 30 seconds
- CPU time: 30 seconds (at limit!)
- Wall time: 30 seconds
- Success rate: 0% (TLE errors)
After (Distributed):
- 10 users / 10 Workers = 1 user per Worker
- CPU time per Worker: 3 seconds
- Wall time: ~10 seconds (parallel)
- Success rate: 100%
Scalability:
- Current load: 10 users/day
- Theoretical capacity: 25,000 users/day (D1 write limit)
- Headroom: 2500x current load
Advanced Features
1. Stale Item Requeuing
What if a Worker crashes? Items stuck in “processing” need to be requeued.
export async function requeueStaleProcessing(
env: Env,
minutes: number = 10
): Promise<number> {
const result = await env.DB.prepare(`
UPDATE cron_queue
SET status = 'pending', started_at = NULL
WHERE status = 'processing'
AND started_at < datetime('now', '-' || ? || ' minutes')
`)
.bind(minutes)
.run();
return result.meta.changes;
}
Usage in scheduler:
// Reaper for stale processing items (10+ minutes)
ctx.waitUntil(
requeueStaleProcessing(env, 10)
.then((requeued) => {
if (requeued > 0) {
console.log(`[Scheduled] Requeued ${requeued} stale processing items`);
}
})
);
2. Batch Progress Tracking
Monitor batch progress in real-time.
export interface BatchProgress {
pending: number;
processing: number;
completed: number;
failed: number;
total: number;
}
export async function getBatchProgress(
env: Env,
batchId: string
): Promise<BatchProgress> {
const results = await env.DB.prepare(`
SELECT status, COUNT(*) as count
FROM cron_queue
WHERE batch_id = ?
GROUP BY status
`)
.bind(batchId)
.all();
const progress: BatchProgress = {
pending: 0,
processing: 0,
completed: 0,
failed: 0,
total: 0,
};
for (const row of results.results as Array<{ status: string; count: number }>) {
const status = row.status as keyof Omit<BatchProgress, 'total'>;
progress[status] = row.count;
progress.total += row.count;
}
return progress;
}
Getting Your Hands Dirty
Prerequisites
- Cloudflare account (free tier)
- Node.js 18+ (for Wrangler CLI)
- Basic TypeScript knowledge
Setup
# Install Wrangler CLI
npm install -g wrangler
# Create new project
npm create cloudflare@latest my-distributed-cron
# Install dependencies
cd my-distributed-cron
npm install hono
Quick Start
1. Configure wrangler.toml:
name = "my-distributed-cron"
main = "src/index.ts"
compatibility_date = "2025-10-11"
# D1 Database
[[d1_databases]]
binding = "DB"
database_name = "my-queue-db"
database_id = "your-database-id"
# Service Bindings
[[services]]
binding = "SELF"
service = "my-distributed-cron"
# Cron Trigger
[triggers]
crons = ["0 12 * * *"]
2. Create D1 database:
npx wrangler d1 create my-queue-db
npx wrangler d1 execute my-queue-db --file=schema.sql
3. Deploy:
npx wrangler deploy
Production Considerations
Rate Limiting:
- Cloudflare Workers: 100,000 requests/day (free tier)
- D1 writes: 100,000/day (free tier)
- Bottleneck: D1 writes (2 writes per user = 50,000 users/day)
Error Handling:
- Idempotency checks (prevent duplicate processing)
- Stale item requeuing (handle Worker crashes)
- Return 200 on failure (don’t block other Workers)
Monitoring:
- Cloudflare Analytics (built-in)
- Custom logging (Analytics Engine)
- Batch progress tracking (API endpoint)
What Surprised Me: The Trade-offs
The Good
1. Scales Beyond Single-Worker Limits
- Sequential: 10 users max (30s CPU limit)
- Distributed: 1000+ users (parallel processing)
- Each Worker gets fresh 30s CPU budget
2. Zero External Dependencies
- No Redis, SQS, or RabbitMQ needed
- D1 SQLite handles queue perfectly
- Service Bindings built into Workers
3. Cost-Effective
- Free tier: 100k requests/day
- Current usage: ~50 requests/day
- Headroom: 2000x capacity
The Not-So-Good
1. D1 Write Limits
- Free tier: 100k writes/day
- 2 writes per user = 50k users/day max
- Workaround: Batch writes, cleanup old data
2. Cold Start Latency
- First Worker: ~100ms cold start
- Subsequent Workers: ~10ms warm
- Impact: Minimal (parallel processing)
3. Debugging Complexity
- Multiple Workers = multiple logs
- Need batch tracking to correlate
- Solution: Batch ID + structured logging
When to Use This
- Processing N independent tasks (users, jobs, etc.)
- Each task takes significant CPU time (>1 second)
- Need to scale beyond single-Worker limits
- Want to stay on free tier
When NOT to Use This
- Tasks are fast (<100ms each)
- Need strict ordering (queue guarantees order)
- Require transactional guarantees across tasks
- Need more than 100k writes/day (D1 limit)
The Cost Calculation
Free tier limits:
- Cloudflare Workers: 100,000 requests/day
- D1 database: 100,000 writes/day
- Bottleneck: D1 writes (2 writes per user)
Current usage (10 users/day):
- Workers: ~20 requests/day (10 users × 2 endpoints)
- D1 writes: ~40 writes/day (queue + notifications)
- Cost: $0/month
Projected usage (1000 users/day):
- Workers: ~2,000 requests/day
- D1 writes: ~4,000 writes/day
- Cost: Still $0/month (20x headroom)
When would I need to pay?
- 50,000 users/day (D1 write limit)
- Paid tier: $5/month (D1)
- Still cheaper than Redis/SQS
Resources:
- Streaky Live App
- Streaky GitHub Repository
- Cloudflare Workers Documentation
- D1 Database Documentation
- Service Bindings Guide
Further Reading:
Connect
- GitHub: @0xReLogic
- LinkedIn: Allen Elzayn