The 2026 production checklist
Reliability
- Request timeouts and sensible retries
- Circuit breakers for fragile dependencies
- Graceful shutdown and connection draining
- Rate limiting and backpressure
Data safety
- Idempotency for write APIs
- Clear transaction boundaries
- Audit logs for critical actions
- Backups and restore drills
Your goal is not “zero errors”. Your goal is “errors are fast to detect, fast to diagnose, and low blast radius”.
Observability: logs, metrics, traces
- Structured logs with request IDs and user/org IDs
- Error monitoring with grouping and release tracking
- Latency percentiles (p50/p95/p99) for key endpoints
- Distributed tracing across services and queues
Practical rule
If you can't answer “what changed?” in 2 minutes during an incident, add release tags and a deploy timeline to your dashboards.
Security hardening that prevents real incidents
API security
- RBAC checks on every protected action
- Input validation at the edge
- CSRF considerations for cookie auth
- Secrets never logged
Infrastructure
- Least-privilege service accounts
- Network restrictions and private subnets
- WAF/rate limits for public endpoints
- Dependency scanning
Performance: avoid the common bottlenecks
- N+1 queries and missing indexes
- Unbounded concurrency (DB pool exhaustion)
- Large JSON payloads without compression
- Heavy CPU work on the main event loop
- Measure first: endpoint latency percentiles and DB query time
- Fix biggest costs: queries, payload size, caching opportunities
- Protect the system: timeouts, limits, queues
Deployment and CI/CD
- Automated tests for critical flows
- Blue/green or canary deployments for risky releases
- Feature flags for controlled rollout
- Runbooks and on-call rotation basics
Want a production-ready Node.js backend?
Share your stack and current pain points. We'll suggest a concrete checklist and upgrades that reduce incidents and improve performance.