Deploy on Google Cloud Run

This guide deploys the runtime-api image (HTTP API on port 8000) as the primary OSS reference for serverless containers. For background and other platforms, see Serverless containers.

Prerequisites

Google Cloud project with Artifact Registry or Container Registry, and Cloud Run API enabled.
A built image: use prebuilt GHCR images or docker build --target runtime-api -t REGION-docker.pkg.dev/PROJECT/REPO/doctranslate-api:TAG .

Quick deploy (`gcloud`)

Replace placeholders (PROJECT, REGION, IMAGE).

gcloud run deploy doctranslater-api \
  --project=PROJECT \
  --region=REGION \
  --image=IMAGE \
  --port=8000 \
  --cpu=2 \
  --memory=4Gi \
  --min-instances=0 \
  --max-instances=10 \
  --timeout=3600 \
  --session-affinity \
  --set-env-vars=DOCTRANSLATE_API_MAX_CONCURRENT_JOBS=1

Notes:

--port=8000 must match the container listen port (Dockerfile CMD for runtime-api).
--timeout: maximum request time (including long-running HTTP connections if your client holds the connection open). For very large PDFs, prefer a worker pattern (Serverless containers) or raise timeout within Cloud Run limits.
--session-affinity: helps clients stick to one revision instance while polling in-process jobs (202 + GET /v1/jobs/{id}). It is not a substitute for an external job store at high scale—see HTTP API – Serverless and multi-instance behavior.
Secrets: pass LLM keys with --set-secrets=OPENAI_API_KEY=openai-api-key:latest (after creating the secret) or Secret Manager volume mounts; do not bake keys into images (Docker – Security).

Environment variables (Cloud Run)

Set the same variables as in HTTP API – Environment variables. Common Cloud Run additions:

Variable	Recommendation
`DOCTRANSLATE_API_DATA_ROOT`	`/tmp/doctranslate-api` or a mounted volume path if using Cloud Run volumes
`DOCTRANSLATE_API_TMP_ROOT`	Optional; defaults under `data_root/tmp`
`DOCTRANSLATE_API_WARMUP_ON_STARTUP`	`eager` to reduce first-request latency (downloads/fonts/models; slower startup)
`DOCTRANSLATE_API_REQUIRE_ASSETS_READY`	`true` if `/v1/health/ready` must block until assets exist
`DOCTRANSLATE_API_JOB_TIMEOUT_SECONDS`	Set to bound wall-clock per job (e.g. `7200`)

Mount a volume (or use an init sidecar pattern) for /home/doctranslater/.cache/doctranslate if you want persistent ONNX/font caches across instances.

Health checks

Liveness: GET /v1/health/live
Readiness: GET /v1/health/ready (writable dirs, optional assets, job capacity)

Configure Cloud Run startup probe / health checks to hit /v1/health/ready when using DOCTRANSLATE_API_REQUIRE_ASSETS_READY=true or after warm-up.

Example manifest (Knative / Cloud Run YAML)

A minimal Knative Service shape (exact schema depends on your Cloud Run / Anthos setup):

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: doctranslater-api
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/minScale: "0"
        autoscaling.knative.dev/maxScale: "10"
    spec:
      containerConcurrency: 8
      timeoutSeconds: 3600
      containers:
        - image: ghcr.io/OWNER/doctranslater-api:main
          ports:
            - containerPort: 8000
          env:
            - name: DOCTRANSLATE_API_MAX_CONCURRENT_JOBS
              value: "1"

Repository copy: docs/deploy-samples/cloud-run-service.sample.yaml.

Cold start and cost

Min instances > 0 reduces cold latency for interactive use.
Warm images (runtime-api built from warm builder stages) are not published by default CI; either build a custom warm image or run POST /v1/assets/warmup after deploy.

Verify

curl -sS "https://YOUR-SERVICE-URL/v1/health/live"
curl -sS "https://YOUR-SERVICE-URL/v1/health/ready"

Smoke a no-LLM pipeline check using skip_translation (see HTTP API – Docker) once OPENAI_API_KEY is set (translator is still constructed; use a placeholder only for this smoke if your policy allows).