Benchmarks and performance
DocTranslater performance depends on PDF complexity, layout model cost, LLM latency and pricing, and hardware. The project does not publish CI-enforced throughput SLAs; numbers here describe how to measure and what to expect at a high level.
What affects runtime
- Pages and text volume — more IL paragraphs mean more LLM round-trips.
- Router / model — hosted APIs vs local servers change latency profiles entirely.
- OCR —
--ocr-modeother thanoffadds RapidOCR and image work (optional extraocr). - Translation memory — SQLite and fuzzy tiers add lookup cost but can skip LLM calls.
- Assets — first run may download layout models; use
doctranslate assets warmupand optional offline packs (see Configuration).
Manual end-to-end timing (local LLM)
Use the helper script (requires a running local server and a sample PDF):
uv run python scripts/bench_local_translation.py \
--pdf examples/ci/test.pdf \
--lang-in en --lang-out zh \
--backend ollama --model qwen2.5:7b
The script prints JSON with elapsed_seconds for the full translate subprocess. Compare runs on the same machine only.
Hosted API runs
For OpenAI or router mode, wall time is dominated by tokens × model speed × rate limits. Rough indicative tables for GPT-class models may appear in the root README; treat them as examples, not guarantees.
Profiling (contributors)
- Memray — optional extra
DocTranslater[memray]; see Contributing — Profile memory usage. - py-spy — listed in the dev dependency group for sampling profiles of running processes.
Automation in CI
The test workflow runs a warn-only import/parser timing check so large accidental regressions in CLI startup are visible in logs. It does not fail the build on threshold alone (see scripts/check_cli_import_time.py).
Layered harness (micro, meso, HTTP, Docker)
Install perf tooling:
| Layer | What | Command / workflow |
|---|---|---|
| Micro | pytest-benchmark on parser, mocked router, HTTP inspect, parse-only PDF path |
uv run pytest tests/perf/ -m perf --benchmark-only (default pytest tests/ excludes -m perf; use an explicit tests/perf/ run) |
| Meso | Subprocess CLI timings (help, inspect JSON, assets warmup, translate --skip-translation) |
uv run python scripts/perf_meso.py → JSON on stdout |
| HTTP load | Locust against a running API | uv run locust -f benchmarks/load/locustfile.py --headless -u 4 -r 1 -t 20s --host http://127.0.0.1:8999 (set PERF_INSPECT_PDF to an absolute path to include /v1/inspect in the mix) |
| Docker | Image size + doctranslate --version in a fresh container + optional API health |
uv run python scripts/perf_docker_metrics.py (builds runtime-cpu / runtime-api locally; can take a long time) |
| Memory | Memray on short CLI paths | uv sync --extra memray then see Contributing — Profile memory usage; nightly workflow uploads a sample flamegraph when examples/ci/test.pdf exists |
Workflows: .github/workflows/perf-pr.yml (path-filtered PR microbenchmarks + JSON artifact), .github/workflows/perf-nightly.yml (weekly schedule: microbench, meso JSON, Locust, Memray, Docker metrics with continue-on-error where builds may be slow).
Corpus plan: benchmarks/corpus/README.md — extend beyond examples/ci/test.pdf with licensed fixtures; keep PR jobs small.
Regression policy: PR jobs are informational (artifacts + logs), not throughput SLAs. To tighten later, use pytest-benchmark’s --benchmark-compare-fail only after collecting a stable baseline on one OS/Python tuple.
Runtime metrics (operators)
When the HTTP API runs with Prometheus enabled, histograms such as doctranslate_pipeline_stage_duration_seconds, doctranslate_translator_latency_seconds, and doctranslate_job_duration_seconds complement microbenchmarks for deployment-level drift; see Observability.