Stable library API
DocTranslater exposes a versioned Python API for embedding translation in other applications. Prefer these entry points over deep imports under doctranslate.format.pdf.
Install
Canonical imports
| Purpose | Module |
|---|---|
| Translate, async progress, inspect, validate | doctranslate.api |
| Request/result/event Pydantic models | doctranslate.schemas (also re-exported from doctranslate.api where noted) |
Translation request (JSON-serializable)
Build a TranslationRequest (see doctranslate.schemas and doctranslate/schemas/public_api.py in the repo). Minimal OpenAI example:
from pathlib import Path
from doctranslate.api import translate, validate_request
req = validate_request(
{
"input_pdf": str(Path("input.pdf").resolve()),
"lang_in": "en",
"lang_out": "zh",
"translator": {
"mode": "openai",
"ignore_cache": False,
"openai": {
"model": "gpt-4o-mini",
"api_key": None, # use OPENAI_API_KEY if omitted
},
},
"options": {
"output_dir": str(Path("./out").resolve()),
"qps": 4,
},
}
)
result = translate(req)
print(result.summary.original_pdf_path)
for a in result.artifacts.items:
print(a.kind, a.path, a.sha256)
Async progress
import asyncio
from pathlib import Path
from doctranslate.api import async_translate, validate_request
async def main() -> None:
req = validate_request(
{
"input_pdf": str(Path("input.pdf").resolve()),
"lang_in": "en",
"lang_out": "zh",
"translator": {"mode": "openai", "openai": {"model": "gpt-4o-mini"}},
"options": {"output_dir": str(Path("./out").resolve())},
}
)
async for event in async_translate(req):
if event["type"] == "progress_update":
print(event["overall_progress"])
elif event["type"] == "finish":
print(event["translation_result"]["summary"])
elif event["type"] == "error":
print(event["error"])
break
asyncio.run(main())
Each event dict includes schema_version and event_version for forward compatibility.
Validate configuration only
Inspect PDFs (no LLM)
from doctranslate.api import inspect_input
info = inspect_input(["./doc.pdf"])
for f in info.files:
print(f.path, f.page_count, f.prior_translated_marker)
CLI: JSON request file
For subprocess integration, write a TranslationRequest JSON file and run:
doctranslate --output-format json translate \
--request-json request.json \
--emit-progress-json \
doc.pdf
- With
--emit-progress-json, stdout receives NDJSON lines withstream: "progress"followed by the usual final JSON envelope from--output-format json. - You can also place
--output-format jsonimmediately aftertranslate(seedoctranslate translate --help).
Legacy TranslationConfig
Existing code may still pass TranslationConfig to translate / async_translate. That path returns the legacy runtime result type and is supported for compatibility; new integrations should use TranslationRequest and TranslationResult.
Further reading
- HTTP API — optional FastAPI service for containers (
DocTranslater[api]). - Public API policy — semver boundaries and breaking-change rules.
- Package layers — optional extras and minimal installs.