Observability
DocTranslater exposes structured logs, optional Prometheus metrics, and optional OpenTelemetry tracing across the CLI, HTTP API, ARQ workers, PDF pipeline, and translator routing.
Profiles and environment
Settings are read from process environment variables (shared by doctranslate serve, doctranslate worker, and CLI translate runs).
| Variable | Default | Meaning |
|---|---|---|
DOCTRANSLATE_OBS_PROFILE |
minimal |
minimal | logs_only | prometheus | otlp (OTLP enables tracing export) |
DOCTRANSLATE_LOG_FORMAT |
json |
json or console (human-friendly) |
DOCTRANSLATE_LOG_LEVEL |
INFO |
Root log level |
DOCTRANSLATE_LOG_REDACT_USER_TEXT |
true |
Truncate/redact sensitive-looking log payloads |
DOCTRANSLATE_REQUEST_ID_HEADER |
X-Request-ID |
Incoming correlation header; response echoes the resolved id |
DOCTRANSLATE_METRICS_ENABLED |
true |
When false, Prometheus instruments are not registered |
DOCTRANSLATE_METRICS_PATH |
/metrics |
Path for the Prometheus scrape endpoint (HTTP API only) |
DOCTRANSLATE_METRICS_NAMESPACE |
doctranslate |
Metric name prefix |
DOCTRANSLATE_OTEL_ENABLED |
false |
Set true or use profile otlp to configure tracing |
DOCTRANSLATE_OTEL_SERVICE_NAME |
doctranslate |
service.name resource attribute |
DOCTRANSLATE_OTEL_RESOURCE_ATTRIBUTES |
(empty) | Comma-separated key=value pairs merged into the resource |
Standard OTLP exporter environment variables (for example OTEL_EXPORTER_OTLP_ENDPOINT) are honored by the SDK when tracing is enabled.
HTTP API
- Request IDs: Every response includes
X-Request-ID. Error bodies (ApiErrorEnvelope) include the samerequest_idfor log correlation. - Prometheus:
GET /metricsexposes RED-style HTTP metrics, job queue depth (updated on readiness checks), job lifecycle histograms, pipeline stage timings (when the PDF stack runs), translator router counters, and asset warmup counters when metrics are enabled. These histograms are the primary runtime regression signals in production (complement OSS microbenchmarks in Benchmarks). - Tracing: With
DOCTRANSLATE_OTEL_ENABLED=trueand a reachable OTLP collector, spans cover FastAPI requests (via auto-instrumentation), ARQ job execution (job.execute,job.warmup), and PDFpipeline.translate_sync. For split API + worker deployments, the API stores a W3Ctraceparenton queued jobs so the worker can continue the trace.
Workers (ARQ)
Run the worker with the same DOCTRANSLATE_* observability variables as the API. Metrics use the same registry naming; scrape each process that should be monitored (API and workers are separate processes).
CLI
doctranslate translate configures structured logging and optional Prometheus metrics using the same environment variables. Each CLI invocation binds a fresh cli_run_id in log context.
Router metrics vs service metrics
TOML/CLI metrics_output / metrics_json_path still control end-of-run router summaries (per-provider tokens, cost, latency averages). Service-level Prometheus metrics complement those with labeled counters/histograms suitable for dashboards.
Docker and serverless
- Default OSS / Docker profile: JSON logs +
/metricson the API container. - Ship stdout/stderr to your platform log sink; correlate with
job_idandX-Request-ID. - For multi-replica setups, see HTTP API queue modes and HTTP API workers.
Security
- Logs apply redaction for common secret keys and long strings when
DOCTRANSLATE_LOG_REDACT_USER_TEXTis true. - Do not enable verbose logging of raw document text in production.
Phased rollout (contributors)
Suggested PR sequence to keep risk bounded:
- Foundation —
doctranslate.observabilitypackage, settings, structured logging, request-id middleware. - HTTP metrics —
/metrics, HTTP + job queue gauges, error envelope correlation. - Worker tracing — persist
traceparent,job.execute/ warmup spans, terminal job metrics. - Translator — router Prometheus instruments while preserving
metrics_output/metrics_json_path. - PDF stages —
ProgressMonitorstage timings and pipeline spans. - Docs/tests — expand coverage and deployment guides (this page, HTTP API, Docker/serverless).