Reliable Event Processing (B2B SaaS)

2026Architecture + Backend Engineering

A reliable webhook & integration layer for SaaS: process external events safely, avoid duplicates, and keep operations observable.

Node.jsPostgreSQLNestJSQueuesObservability

What it is

A small, production-ready service that receives webhooks, persists events, processes them asynchronously, and applies effects exactly once — even when providers retry, duplicate, or send bad payloads.

Context

External events (webhooks, payments, integrations) are unreliable input. Processing them directly turns retries and duplicates into inconsistent state and financial risk.

Failure modes

Providers retry aggressively, payloads arrive malformed, delivery order is not guaranteed. The system must remain correct even with noisy input.

What you get

Predictable behavior with retries, duplicates, and malformed input. Full visibility to operate the system. When something fails: see the reason, decide the action, maintain consistent state.

Key guarantees

No duplicate effects

Retries and duplicate webhooks won't double-charge or double-activate.

Idempotency keys + effects ledger.

Safe failures, no partial state

Malformed events fail safely and the reason is recorded.

Deterministic validation + persisted failure reason.

Operational control

Retries are deliberate and auditable — humans stay in control.

Manual requeue with actor/reason/timestamp.

Architecture details

Event flow

Ingest

Validates and persists the event. Responds 202 immediately.

Ledger + Job

Append the event to the immutable ledger and creates a job.

Worker

Processes jobs asynchronously with idempotency guarantees.

Effect + Admin Loop

Applies the effect once or moves the job to failed for auditable manual intervention.

Job states

queued
Persisted, waiting to be processed.
in_progress
Executing on a worker.
done
Effect applied successfully. Will not be repeated.
failed
Permanent error. Requires manual operational intervention.

Trade-offs

Intentionally minimal system: no aggressive automatic retries, no complex scheduling, no automatic recovery. Features are added only when necessary, without compromising correctness and auditability.

Failure stories

Duplicate Event

Scenario: The provider sends the same event multiple times.

bash
curl -s -X POST http://localhost:3000/events/ingest \
  -H "Content-Type: application/json" \
  -d '{
    "event_id": "evt_duplicate_demo_1",
    "event_type": "subscription.paid",
    "payload": { "subscription_id": "sub_123" }
  }'
# => {"accepted": true}

curl -s -X POST http://localhost:3000/events/ingest \
  -H "Content-Type: application/json" \
  -d '{
    "event_id": "evt_duplicate_demo_1",
    "event_type": "subscription.paid",
    "payload": { "subscription_id": "sub_123" }
  }'
# => {"accepted": true}

Behavior: Idempotency keys prevent duplicate processing. Only the first event produces an effect.

Outcome: State changes once. Duplicates remain visible in the ledger for audit.

json
curl -s http://localhost:3000/admin/effects | jq
# => output
{
  "items": [
    {
      "id": "1",
      "idempotency_key": "activate_subscription:sub_123",
      "subscription_id": "sub_123",
      "status": "succeeded",
      "error_message": null,
      "created_at": "2026-02-09T10:32:29.301Z",
      "updated_at": "2026-02-09T10:32:29.303Z"
    }
  ],
  "limit": 50
}

Malformed Payload

Scenario: A webhook arrives with missing required fields.

bash
curl -s -X POST http://localhost:3000/events/ingest \
  -H "Content-Type: application/json" \
  -d '{
    "event_id": "evt_malformed_demo_1",
    "event_type": "subscription.paid",
    "payload": {}
  }'
# => {"accepted": true}

Behavior: Processing fails deterministically. The failure reason is persisted.

json
curl -s http://localhost:3000/admin/jobs | jq
# => output
{
  "items": [
    {
      "id": "1",
      "status": "failed",
      "event_ledger_id": "1",
      "event_type": "subscription.paid",
      "external_event_id": "evt_malformed_demo_1",
      "attempts": 1,
      "max_attempts": 3,
      "failure_type": "permanent",
      "last_error": "Malformed payload: missing subscription_id",
      "created_at": "2026-02-09T10:42:36.035Z"
    }
  ],
  "limit": 50
}

Outcome: No partial effects. The job is visible in failed state and operable.

Manual Requeue

Scenario: An operator requeues a failed job after investigation.

bash
curl -s http://localhost:3000/admin/jobs | jq

Behavior: The requeue creates an audit record (actor + reason + timestamp) and reinserts the job.

json
{
  "items": [
    {
      "id": "1",
      "status": "failed",
      "attempts": 1,
      "failure_type": "permanent",
      "last_error": "Malformed payload: missing subscription_id"
    }
  ]
}
bash
curl -s -X POST http://localhost:3000/admin/jobs/1/requeue \
  -H "Content-Type: application/json" \
  -d '{
    "actor": "admin@example.com",
    "reason": "manual retry to requeue job"
  }' | jq
json
{
  "ok": true,
  "id": "1",
  "status": "queued",
  "available_at": "2026-02-09T10:51:45.416Z",
  "audit": {
    "id": "1",
    "action": "manual_requeue",
    "actor": "admin@example.com",
    "reason": "manual retry to requeue job",
    "created_at": "2026-02-09T10:51:45.416Z"
  }
}

Outcome: Manual intervention is fully traceable. The system remains explainable.

json
{
  "items": [
    {
      "audit": {
        "id": "1",
        "job_id": "1",
        "action": "manual_requeue",
        "actor": "admin@example.com",
        "reason": "manual retry to requeue job",
        "created_at": "2026-02-09T11:04:31.714Z"
      }
    }
  ],
  "limit": 50
}

Get in touch

If your product depends on external events and requires operational correctness with retries, duplicates, and out-of-order delivery, I can help you design it.

Get in touch