Production Considerations for Backend Systems

Developing software locally is only one part of the job. Systems must also operate reliably in production, handle failures, and remain maintainable as requirements change. A backend should be understandable by the people responsible for supporting it, both during deployment and months later when issues need to be investigated or new features introduced.

1. Configuration Management

Configuration should be separated from application code. Database credentials, API keys, service endpoints, and other environment-specific values should be provided at runtime rather than embedded in source code.

The Twelve-Factor App model describes configuration as external to the application process. In practice, this means values such as database credentials, API keys, and service endpoints are provided at runtime rather than embedded in source code. See https://12factor.net/ for the original reference.

Common problematic patterns include:

Hardcoded database credentials
API keys committed in source control
Version-controlling .env files
Environment branching inside application logic (for example, if env == "prod")

More stable approaches:

Inject configuration through environment variables at runtime
Use dedicated secret management systems such as AWS Secrets Manager or HashiCorp Vault for sensitive values
Build a single artifact that is deployed across all environments without modification

The implementation differs across ecosystems. FastAPI applications often use pydantic-settings, Node.js relies on process.env or libraries like dotenv. The mechanism changes, but the model stays the same: configuration is provided externally at runtime.

Example (Secret Management):

For secrets, the application typically retrieves credentials from a secret manager during startup:

// Request to AWS Secrets Manager
{
  "SecretId": "prod/database/credentials"
}

The credential is kept in memory for the lifetime of the process and is not stored in source code or committed files.

2. API Design and Idempotency

API changes affect every consumer of that API. Versioning, backward compatibility, and idempotent operations help reduce operational risk.

Idempotency ensures that repeating the same request doesn't change the system state beyond the first successful execution. This is important in distributed systems where retries are common due to timeouts, network failures, or client-side retry logic.

Without idempotency, retry behaviour can lead to duplicate writes and inconsistent state.

Payment processing with a uniqueness constraint

Each request carries a transaction identifier.

Client submits a payment with transaction_id = 98765
The backend stores the transaction with a UNIQUE constraint on transaction_id
If the insert succeeds, the payment is processed
If the insert fails due to a duplicate key, the system treats it as already processed and returns success

The database becomes the enforcement point for preventing duplicate execution.

Idempotency keys with external services

For operations that cannot rely on database constraints, an idempotency key can be used.

Client sends Idempotency-Key: 123e4567-e89b-12d3-a456-426614174000
Backend checks Redis for the key
If the key exists, the cached response is returned
If the key is missing, the operation is executed and the response is stored against the key

This pattern is commonly used for interactions with external services such as email providers or payment gateways.

3. Separation of Responsibilities

Backend systems tend to be more maintainable when HTTP handling, business logic, and data access are kept distinct, as this reduces coupling between concerns and improves clarity around system boundaries. This aligns with the Single Responsibility Principle (https://en.wikipedia.org/wiki/Single-responsibility_principle).

A typical structure looks like this:

Routing layer: Handles HTTP requests, request parsing, validation, and response formatting
Service layer: Contains business logic and application rules
Data access layer: Handles persistence and database queries

Each layer has a clear boundary in terms of responsibility and avoids embedding knowledge of other layers’ implementation details.

With this separation in place, changes tend to stay local. For example, a database migration or a change in persistence strategy doesn't require rewriting HTTP handlers or core business logic, as long as the data access contract remains stable.

4. Observability

When production systems fail, engineers rely on logs, metrics, and traces to understand what changed and where the failure occurred. The SRE framework describes this through four primary signals (https://sre.google/sre-book/monitoring-distributed-systems/).

Core signals:

Latency: Time taken to process a request
Traffic: Request volume over time
Errors: Rate of failed requests (4xx and 5xx responses)
Saturation: Resource usage such as CPU, memory, and connection pools

These signals are typically used together rather than in isolation when assessing system health.

Structured logging

Logs are more useful when they are machine-readable rather than plain text. JSON-formatted logs allow consistent parsing across aggregation systems such as Elasticsearch or Datadog.

{
  "level": "INFO",
  "event": "order_processed",
  "order_id": 8831,
  "duration_ms": 142,
  "timestamp": "2025-02-03T10:15:00Z"
}

Distributed tracing

In distributed systems, a single request may pass through multiple services before completing. Tracing systems such as OpenTelemetry propagate a trace_id across each service boundary.

Example flow:

API Gateway generates trace_id: abc-123
User Service receives and forwards the same identifier
Database interactions are recorded under the same trace

All logs associated with the request include the same identifier, allowing engineers to reconstruct the full request path during incident analysis.

Key Takeaways

Resilient backend systems rely heavily on structural constraints:

Externalised configuration via environment variables or secret managers.
Contract-driven APIs with enforced idempotency.
Strict architectural boundaries separating routing, business logic, and persistence.
Comprehensive observability utilizing structured logs and distributed tracing.

The implementation details will vary between teams and tech stacks. The goal remains the same: build systems that can be operated, debugged, and changed without unnecessary risk.

Production Considerations for Backend Systems

Comments

More from this blog

AI is a Tool, You Are the Owner

Local Dev Tools and Security Implications

Part 1: Ubuntu System Hardening and Execution Baseline

Understanding the Software Development Life Cycle (SDLC)

1. Configuration Management

2. API Design and Idempotency

3. Separation of Responsibilities

4. Observability

Key Takeaways

Command Palette

Comments

More from this blog

1. Configuration Management

2. API Design and Idempotency

3. Separation of Responsibilities

4. Observability

Key Takeaways