Why OpenTelemetry now
A year ago, recommending OpenTelemetry for production was hedged advice. The specification was still stabilising in places, the collector had rough edges, and several language SDKs were marked as beta for signals that teams actually needed.
That has changed. The tracing specification is stable across all major languages. Metrics are stable in most. Logs are generally available. The collector is production-hardened. The ecosystem of backends — both open-source and commercial — that natively ingest OTel data has grown substantially.
The practical case for OpenTelemetry is now straightforward: instrument once, choose your backend independently, and avoid re-instrumentation when you switch tools. Given how often observability tooling changes in growing organisations, that portability has real value.
Here's the setup we've converged on, and why.
The architecture in one paragraph
Applications are instrumented with OTel SDKs and export telemetry (traces, metrics, logs) to the OpenTelemetry Collector running as a DaemonSet or Deployment in your cluster. The Collector processes, batches, and routes the data to your storage backends: Grafana Tempo for traces, Prometheus for metrics, and Grafana Loki for logs. Grafana is the unified query and visualisation layer across all three. This stack is fully open-source, self-hostable, and free at any scale — or you can replace the storage backends with managed services (Grafana Cloud, Datadog, Honeycomb) without changing your instrumentation.
Instrumenting your applications
For most languages, auto-instrumentation handles the majority of what you need without writing any observability code. The OpenTelemetry operator for Kubernetes can inject auto-instrumentation into your pods using an annotation:
annotations:
instrumentation.opentelemetry.io/inject-python: "true"This works for Java, Python, Node.js, .NET, and Go (Go requires a different approach due to its compilation model). Auto-instrumentation gives you incoming/outgoing HTTP spans, database query spans, and basic metrics for most frameworks automatically.
For custom business logic — things like "time spent in payment processing" or "number of cache misses for a specific key pattern" — you'll add manual instrumentation. Keep this minimal and purposeful. The goal is signal, not coverage percentage.
The critical thing to set up early is the resource attributes: `service.name`, `service.version`, `deployment.environment`. These propagate to every piece of telemetry your service emits and are what allow you to filter traces, metrics, and logs by service and environment in Grafana. Set them once at SDK initialisation and you're done.
The Collector configuration
The Collector is the most powerful and most misconfigured piece of the stack. Its pipeline model — receivers → processors → exporters — gives you a lot of flexibility, and it's easy to end up with a configuration that's doing more than it should.
Our baseline Collector configuration for a production cluster:
Receivers: `otlp` (for applications sending to the Collector), `kubeletstats` (for node and pod metrics), `k8s_attributes` (to enrich spans and metrics with Kubernetes metadata automatically).
Processors: `memory_limiter` (first in the pipeline — prevents the Collector from OOMKilling under load), `batch` (groups telemetry before export to reduce network calls), `k8sattributes` (enriches with pod labels, namespace, node name), `resource` (sets or overrides resource attributes).
Exporters: `otlp/tempo` for traces, `prometheusremotewrite` for metrics, `loki` for logs.
The memory_limiter is not optional. A Collector without it will consume all available memory during a traffic spike and crash, which is the worst time for your observability to stop working.
Tempo for traces
Tempo is Grafana's distributed tracing backend. Its design is deliberately simple: it stores traces as objects in object storage (S3, GCS, or Azure Blob) and indexes only the trace ID and a minimal set of metadata. This makes it extremely cost-effective at scale — you're paying object storage prices for trace data rather than the much higher costs of a time-series database.
The trade-off is that Tempo doesn't support arbitrary tag-based search without Tempo's "TraceQL" or without an additional index. For most teams, filtering by service name, operation name, and duration in Grafana is sufficient. If you need rich tag-based search, enable Tempo's metrics-generator, which derives span metrics (RED metrics per route) from your trace data — this gets stored in Prometheus and is queryable there.
In Grafana, once Tempo and Prometheus are both configured as data sources, you get trace-to-metrics and metrics-to-trace correlation automatically. Click a spike on a latency graph, jump to the traces that were running at that moment. This is the workflow that makes distributed systems debuggable.
Loki for logs
Loki's model is similar to Tempo's: it stores log streams as compressed chunks in object storage, with a lightweight index over labels (not log content). This makes it far cheaper to operate than Elasticsearch for log volumes, though the query model is different — you filter by labels first, then search within log lines, rather than full-text searching across all logs.
The key to making Loki useful is getting your labels right. Labels are how you filter: `{app="payments", env="production"}` gives you all log streams for the payments service in production. Labels should be low-cardinality — `app`, `env`, `namespace`, `pod` are good. Pod names that include unique IDs, request IDs, or trace IDs are bad labels (use log line content for those). High-cardinality labels cause Loki's index to balloon and queries to slow down.
Set up Promtail or the OTel Collector's `filelog` receiver to ship logs from your pods. If you're already using the OTel Collector for traces and metrics, using it for logs too simplifies your architecture and ensures consistent resource attribute enrichment.
Connecting the signals in Grafana
The value of this stack isn't in any individual component — it's in the correlation between them.
Set up exemplars in Prometheus: these are trace IDs embedded in metric data points that let you jump from a specific data point on a metrics graph directly to a trace from that moment. For Golang, the Prometheus client library supports exemplars natively. For other languages, the OTel SDK handles this automatically when you set up both metrics and traces.
Set up derived fields in Grafana's Loki data source: this tells Grafana that when a log line contains a string matching `traceID=([a-f0-9]+)`, it should render that as a clickable link to the corresponding trace in Tempo. One configuration in Grafana, and every log line with a trace ID becomes navigable.
With these in place, a debugging session looks like: alert fires on p99 latency → click through to trace → trace shows slow database span → click log link from span → logs show the exact query and its parameters. The full context of an incident, navigable from a single starting point.
Getting started without the full stack
If you're not ready to run the full stack immediately, the migration path is incremental:
Start with the OTel Collector deployed as a DaemonSet, receiving traces from your applications and exporting to whatever backend you already have (Jaeger, Zipkin, Datadog — the Collector can export to all of them). Your instrumentation doesn't change when you swap backends later.
Add Tempo when you're ready to own your trace storage. Add Loki when you want to correlate logs. Add the Prometheus integration when you want RED metrics per service automatically derived from traces.
Each step is independently useful. You don't need to do it all at once.