Reliable Event Processing (B2B SaaS)
A reliable webhook & integration layer for SaaS: process external events safely, avoid duplicates, and keep operations observable.
What it is
A small, production-ready service that receives webhooks, persists events, processes them asynchronously, and applies effects exactly once — even when providers retry, duplicate, or send bad payloads.
Context
External events (webhooks, payments, integrations) are unreliable input. Processing them directly turns retries and duplicates into inconsistent state and financial risk.
Failure modes
Providers retry aggressively, payloads arrive malformed, delivery order is not guaranteed. The system must remain correct even with noisy input.
What you get
Predictable behavior with retries, duplicates, and malformed input. Full visibility to operate the system. When something fails: see the reason, decide the action, maintain consistent state.
Key guarantees
No duplicate effects
Retries and duplicate webhooks won't double-charge or double-activate.
Idempotency keys + effects ledger.
Safe failures, no partial state
Malformed events fail safely and the reason is recorded.
Deterministic validation + persisted failure reason.
Operational control
Retries are deliberate and auditable — humans stay in control.
Manual requeue with actor/reason/timestamp.
Architecture details
Event flow
Ingest
Validates and persists the event. Responds 202 immediately.
Ledger + Job
Append the event to the immutable ledger and creates a job.
Worker
Processes jobs asynchronously with idempotency guarantees.
Effect + Admin Loop
Applies the effect once or moves the job to failed for auditable manual intervention.
Job states
Trade-offs
Intentionally minimal system: no aggressive automatic retries, no complex scheduling, no automatic recovery. Features are added only when necessary, without compromising correctness and auditability.
Failure stories
Duplicate Event
Scenario: The provider sends the same event multiple times.
curl -s -X POST http://localhost:3000/events/ingest \
-H "Content-Type: application/json" \
-d '{
"event_id": "evt_duplicate_demo_1",
"event_type": "subscription.paid",
"payload": { "subscription_id": "sub_123" }
}'
# => {"accepted": true}
curl -s -X POST http://localhost:3000/events/ingest \
-H "Content-Type: application/json" \
-d '{
"event_id": "evt_duplicate_demo_1",
"event_type": "subscription.paid",
"payload": { "subscription_id": "sub_123" }
}'
# => {"accepted": true}Behavior: Idempotency keys prevent duplicate processing. Only the first event produces an effect.
Outcome: State changes once. Duplicates remain visible in the ledger for audit.
curl -s http://localhost:3000/admin/effects | jq
# => output
{
"items": [
{
"id": "1",
"idempotency_key": "activate_subscription:sub_123",
"subscription_id": "sub_123",
"status": "succeeded",
"error_message": null,
"created_at": "2026-02-09T10:32:29.301Z",
"updated_at": "2026-02-09T10:32:29.303Z"
}
],
"limit": 50
}Malformed Payload
Scenario: A webhook arrives with missing required fields.
curl -s -X POST http://localhost:3000/events/ingest \
-H "Content-Type: application/json" \
-d '{
"event_id": "evt_malformed_demo_1",
"event_type": "subscription.paid",
"payload": {}
}'
# => {"accepted": true}Behavior: Processing fails deterministically. The failure reason is persisted.
curl -s http://localhost:3000/admin/jobs | jq
# => output
{
"items": [
{
"id": "1",
"status": "failed",
"event_ledger_id": "1",
"event_type": "subscription.paid",
"external_event_id": "evt_malformed_demo_1",
"attempts": 1,
"max_attempts": 3,
"failure_type": "permanent",
"last_error": "Malformed payload: missing subscription_id",
"created_at": "2026-02-09T10:42:36.035Z"
}
],
"limit": 50
}Outcome: No partial effects. The job is visible in failed state and operable.
Manual Requeue
Scenario: An operator requeues a failed job after investigation.
curl -s http://localhost:3000/admin/jobs | jqBehavior: The requeue creates an audit record (actor + reason + timestamp) and reinserts the job.
{
"items": [
{
"id": "1",
"status": "failed",
"attempts": 1,
"failure_type": "permanent",
"last_error": "Malformed payload: missing subscription_id"
}
]
}curl -s -X POST http://localhost:3000/admin/jobs/1/requeue \
-H "Content-Type: application/json" \
-d '{
"actor": "admin@example.com",
"reason": "manual retry to requeue job"
}' | jq{
"ok": true,
"id": "1",
"status": "queued",
"available_at": "2026-02-09T10:51:45.416Z",
"audit": {
"id": "1",
"action": "manual_requeue",
"actor": "admin@example.com",
"reason": "manual retry to requeue job",
"created_at": "2026-02-09T10:51:45.416Z"
}
}Outcome: Manual intervention is fully traceable. The system remains explainable.
{
"items": [
{
"audit": {
"id": "1",
"job_id": "1",
"action": "manual_requeue",
"actor": "admin@example.com",
"reason": "manual retry to requeue job",
"created_at": "2026-02-09T11:04:31.714Z"
}
}
],
"limit": 50
}Get in touch
If your product depends on external events and requires operational correctness with retries, duplicates, and out-of-order delivery, I can help you design it.
Get in touch