Package layers and install profiles
DocTranslater ships as one PyPI distribution (DocTranslater) with optional extras so downstream apps can install only what they need.
Public import surfaces
| Module | Purpose | Typical extras |
|---|---|---|
doctranslate.schemas |
Pydantic models: router/TOML, TranslationRequest, results, events |
none (base install) |
doctranslate.api |
Stable translate / async_translate / validate_request / inspect_input / build_translators |
full |
doctranslate.http_api |
Optional ASGI app (create_app, serve); not imported by default CLI |
api (+ same extras as your translate install) |
doctranslate.engine |
Deprecated shim: pipeline entrypoints + init — prefer doctranslate.api |
full |
doctranslate.pdf |
Deprecated shim: re-exports PDF/IL pipeline — prefer doctranslate.api |
full (or pdf + peers) |
doctranslate.vision |
Layout model types | vision or full |
doctranslate.experimental |
Unstable experiments — not semver | varies |
Deep imports under doctranslate.format.pdf remain valid but are not semver-guaranteed; prefer doctranslate.api for embedding.
Install matrix (quick)
| Goal | Typical command |
|---|---|
| Types / router TOML models only | pip install DocTranslater or uv sync --locked --group dev |
| Explicit “schemas” label (same deps as base) | pip install "DocTranslater[schemas]" — the schemas extra is intentionally empty so docs and scripts can request a named slice without adding packages beyond the core dependency set. |
PDF + CLI, no hosted LLM (combine with llm as needed) |
pip install "DocTranslater[pdf,cli,llm]" |
| Default CLI translate path (matches CI) | pip install "DocTranslater[full]" or uv sync --locked --group dev --extra full |
Python 3.10–3.13 are supported (requires-python = ">=3.10,<3.14" in pyproject.toml).
Optional extras
| Extra | Role |
|---|---|
schemas |
No extra packages — reserved alias so install lines can say DocTranslater[schemas] when embedding only doctranslate.schemas (same as base). |
pdf |
PyMuPDF, xsdata/IL, fonts, spatial indexes, scientific helpers |
llm |
OpenAI client, httpx, LiteLLM, tiktoken, tenacity |
vision |
ONNXRuntime, OpenCV, Hugging Face hub (doclayout assets) |
ocr |
RapidOCR ONNX runtime adapter |
tm |
SQLite cache / fuzzy TM (peewee, rapidfuzz, Levenshtein) |
glossary |
Hyperscan-backed glossary scanning |
cli |
Rich, tqdm, psutil (CLI UX); main.cli() falls back to stdlib logging if Rich is missing |
full |
Meta-extra listing everything needed for the default CLI translate path |
api |
FastAPI, Uvicorn, arq, redis, python-multipart, pydantic-settings, fsspec for the optional HTTP service (doctranslate serve, doctranslate worker) |
api-s3 |
s3fs, boto3 — S3-compatible blob mirror + presigned downloads |
api-gcs |
gcsfs — GCS blob mirror + signed URLs |
tm_semantic |
sentence-transformers + torch (semantic TM tier) |
cuda / directml |
Alternate ONNXRuntime wheels |
Base dependencies are intentionally small: charset detection, Pydantic, TOML, and regex (glossary normalization).
Which extra do I need?
- Embed only router/TOML types in another repo: base install or
pip install "DocTranslater"andimport doctranslate.schemas. - Run the translate CLI or call
doctranslate.api.translate:pip install "DocTranslater[full]"(matches CI). - Custom subset: combine extras (for example
pdf,llm,cli); resolve import errors by adding the missing slice.
Import boundaries (OSS)
doctranslate.schemasmust not require PyMuPDF, ONNX, or LLM HTTP clients.doctranslate.translatorpackage__init__stays lightweight; heavy symbols load via__getattr__.- CLI subcommands that need the PDF stack import their implementations lazily in
doctranslate/cli/dispatch.py; cache dirs are created viadoctranslate/bootstrap.pywithout importing the full IL graph.
CI
- Minimal lane:
uv sync --locked --group dev(no extras) +pytest tests/test_install_profiles.py::test_minimal_schemas_import. - Fast lane:
uv sync --locked --group dev --extra full+pytest tests/ -m "not requires_pdf and not perf"+ MkDocs strict + wheel smoke +scripts/check_cli_import_time.py. - Slim lane:
uv sync --locked --group dev --extra pdf --extra cli+pytest tests/test_install_profiles.py::test_pdf_stack_opens_ci_fixture. - Full matrix:
uv sync --locked --group dev --extra full+ fullpytest, assets warmup, offline pack/restore. - Docs (PR): Zensical build when
docs/**ormkdocs.ymlchanges (.github/workflows/docs-pr.yml).
See Verification for day-to-day commands.
For OCI images and which extras each target installs, see Docker and Docker image profiles.