
Quick Summary ⚡️
In the rush to adopt "web-scale" architectures, many engineering teams prematurely fracture their infrastructure, introducing Redis, Kafka, or RabbitMQ before their traffic demands it. This article argues for the "Boring Stack." We explore how to push PostgreSQL to its limits by utilizing SKIP LOCKED for reliable job queues and UNLOGGED tables for high-speed caching. We will dissect the architectural trade-offs, configuration requirements, and the specific failure modes (vacuum bloat, connection exhaustion) that eventually signal it's time to decouple.
Table of Contents
- The Trap of Premature Polyglot Persistence
- Architecture: The Transactional Job Queue
- Architecture: The Ephemeral Cache Layer
- The Breaking Point: When to Decouple
- Comparison: Monolith vs. Specialized Infra
- Final Thoughts
The Trap of Premature Polyglot Persistence
There is a pervasive myth in modern backend engineering: if you are building a system, you immediately need a primary database, a distinct caching layer (usually Redis), and a distributed event bus (Kafka or RabbitMQ). While this triad is the standard for mature, high-traffic distributed systems, introducing it at the inception of a project—or even for mid-sized production workloads—often incurs a "complexity tax" that outweighs the performance benefits.
Every new piece of infrastructure adds:
- Operational Overhead: Upgrades, security patches, and monitoring.
- Consistency Challenges: Keeping the cache in sync with the DB (cache invalidation is one of the two hardest problems in CS for a reason).
- Network Hops: Additional latency and points of failure between services.
For 95% of applications, PostgreSQL is fast enough to handle the workload of the database, the queue, and the cache simultaneously. By consolidating these concerns, we achieve transactional integrity and drastically simplify the system model. The goal is not to use Postgres forever, but to use it until the cost of the database's limitations exceeds the cost of managing specialized infrastructure.
Let's look at how to implement this without shooting ourselves in the foot.

Architecture: The Transactional Job Queue
Historically, using a relational database as a queue was considered an anti-pattern. Polling caused massive load, and locking entire tables killed concurrency. However, since PostgreSQL 9.5, the FOR UPDATE SKIP LOCKED clause has changed the physics of this problem.
This feature allows workers to select rows that are not currently locked by other transactions, skipping over the locked ones immediately rather than waiting. This turns Postgres into a highly concurrent, atomic message broker.
The Implementation Strategy
To make this production-grade, we cannot simply SELECT and DELETE. We need state management (pending, processing, failed, completed) and dead-letter handling.
-- 1. The Queue Table (optimized for high churn)
CREATE TABLE background_jobs (
id BIGSERIAL PRIMARY KEY,
queue_name TEXT NOT NULL,
payload JSONB NOT NULL,
status TEXT NOT NULL DEFAULT 'queued',
attempt_count INT DEFAULT 0,
run_at TIMESTAMPTZ DEFAULT NOW(),
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Index for finding work efficiently
CREATE INDEX idx_jobs_fetch
ON background_jobs (queue_name, run_at)
WHERE status = 'queued';
-- 2. The Atomic Fetch (Pseudocode logic)
/* Inside a transaction:
1. Lock the rows so no other worker grabs them.
2. Update status to 'processing' to remove them from the queue view.
3. Return the payload to the application.
*/
WITH next_jobs AS (
SELECT id
FROM background_jobs
WHERE status = 'queued'
AND queue_name = 'email_notifications'
AND run_at <= NOW()
ORDER BY run_at ASC
LIMIT 10
FOR UPDATE SKIP LOCKED
)
UPDATE background_jobs
SET status = 'processing',
attempt_count = attempt_count + 1,
updated_at = NOW()
FROM next_jobs
WHERE background_jobs.id = next_jobs.id
RETURNING background_jobs.id, background_jobs.payload;
The Benefits of ACID Queues
The superpower here is Transactional Enqueueing. In a distributed system with Redis or SQS, you often face the dual-write problem: you commit a transaction to your primary DB (e.g., "User Created") but fail to publish the event to the queue. This leads to data inconsistency.
With Postgres, the business data and the job are inserted in the same transaction. If the user creation rolls back, the welcome email job never exists. You get exactly-once delivery guarantees (internally) for free.
Architecture: The Ephemeral Cache Layer
Can Postgres replace Redis? For many read-through caching scenarios, yes. The primary objection is usually write overhead (WAL logs) and eviction policies. We can mitigate the write overhead using UNLOGGED tables.
UNLOGGED tables are not written to the Write-Ahead Log (WAL). This means:
- Speed: Writes are significantly faster (closer to memory speeds, limited by buffer pool).
- Volatility: If the database crashes, these tables are automatically truncated (emptied) upon recovery.
- Replication: These tables are NOT replicated to standbys. (Note: This is a major architectural consideration).
Implementing a KV Store
We can simulate a key-value store with JSONB for flexibility and a stored procedure for auto-expiration (TTL).
-- Create an unlogged table for cache
CREATE UNLOGGED TABLE app_cache (
key TEXT PRIMARY KEY,
value JSONB NOT NULL,
expires_at TIMESTAMPTZ NOT NULL
);
CREATE INDEX idx_cache_expires ON app_cache (expires_at);
-- Upsert Function (Set Key)
INSERT INTO app_cache (key, value, expires_at)
VALUES ('user_session_123', '{"role": "admin"}', NOW() + INTERVAL '1 hour')
ON CONFLICT (key)
DO UPDATE SET value = EXCLUDED.value, expires_at = EXCLUDED.expires_at;
-- Retrieval (Get Key)
SELECT value FROM app_cache
WHERE key = 'user_session_123' AND expires_at > NOW();
To handle TTL (Time To Live), you don't get Redis's passive expiration. You must implement a passive cleanup job (e.g., pg_cron or an external worker) that runs DELETE FROM app_cache WHERE expires_at < NOW() periodically.

The Breaking Point: When to Decouple
Using the "Boring Stack" is a game of resource management. Since you are sharing CPU, RAM, and I/O between OLTP (business logic), Queues, and Caching, you introduce the risk of the "Noisy Neighbor" effect within your own database. Here are the specific failure modes to watch for.
1. Dead Tuples and Vacuum Bloat
Queues are high-churn tables. You insert rows and delete them seconds later. In Postgres, a DELETE does not physically remove data; it marks it as invisible. If your auto-vacuum settings are not aggressive enough, your queue table will bloat, causing index lookups to slow to a crawl.
Fix: You must tune auto-vacuum specifically for the queue table to run much more frequently than the default.
2. Connection Exhaustion
Redis can handle tens of thousands of concurrent connections easily. Postgres cannot. If you have hundreds of workers polling the queue and thousands of web requests querying the cache, you will hit the max_connections limit.
Fix: Use PgBouncer at the transaction level. Do not expose Postgres directly to high-concurrency serverless functions without a proxy.
3. Replication Lag
If you use UNLOGGED tables for caching, remember they don't replicate. If you have a read-replica setup, your cache is local to the primary. If you failover, your cache is empty (cold start). For queues, the high write volume produces massive WAL files, which can saturate the replication stream network bandwidth, causing replicas to fall behind.
Comparison: Monolith vs. Specialized Infra
When deciding between the consolidated approach and a split stack, use the following heuristic breakdown:
| Feature | Postgres (Queue/Cache) | Redis + Kafka/RabbitMQ |
|---|---|---|
| Transactional Safety | ✅ Excellent (ACID integrated) | ❌ Difficult (Dual-write problem) |
| Operational Complexity | ✅ Low (Single node/cluster) | ❌ High (3+ distinct systems) |
| Throughput | ⚠️ Moderate (~2k-5k jobs/sec) | ✅ Extreme (100k+ msg/sec) |
| Latency | ⚠️ 1ms - 10ms | ✅ Sub-millisecond (Redis) |
| Persistence | ✅ Durable on disk | ⚠️ Configurable (AOF/RDB) |
Final Thoughts
As system designers, we often confuse "scalable" with "complex." A PostgreSQL instance on modern hardware (e.g., AWS RDS r6g.4xlarge) can handle a surprising amount of abuse. By utilizing SKIP LOCKED for queues and UNLOGGED tables for caching, you can defer the operational complexity of distributed systems until you have the dedicated DevOps resources to manage them.
The transition point usually arrives when your queue throughput exceeds 2,000 jobs per second, or when cache eviction logic becomes complex enough to require Redis's specialized data structures (like Sorted Sets or HyperLogLogs). Until then, embrace the monolith.
Next Step: Audit your current infrastructure. Are you paying for a managed Redis instance that is only storing simple session data? Consider moving that into an unlogged Postgres table and eliminating a point of failure this week.
Post a Comment