Identify and Translate PDF
Warning
After version 0.3.1 the output has been updated to logging
, which by default only outputs Warning and above. If you want to see the processing, set the logging
level to INFO:
import logging
httpx_logger = logging.getLogger("httpx")
httpx_logger.setLevel(logging.WARNING)
logging.basicConfig(level=logging.INFO)
For demonstration purposes, the following code examples all set the logging
level to INFO.
Client.pdf_translate
Caution
Please note that this interface is not officially supported and is not guaranteed to be available.
Translate one or more PDF files into text files in the specified language.
Parameters
Parameter | Type | Required | Default | Description |
---|---|---|---|---|
pdf_file | str or list | Yes | - | Path to the PDF file or list of PDF file paths |
output_path | str | No | "./Output" | Path to the output folder |
convert | bool | No | False | Whether to convert [ to $ , and [[ to $$ |
language | str | No | "zh" | Target language, supported languages: "en" , "zh" , "ja" , "fr" , "ru" , "pt" , "es" , "de" , "ko" , "ar" |
model | str | No | "deepseek" | Translation model, supported models: "deepseek" , "glm4" |
Return Values
Returns a tuple (list1, list2, status)
containing three elements, in the same order as the input files:
list1
(list
): List of successfully translated files- Elements are translated text and text location (strings)
- Empty string if processing failed
list2
(list
): List of files that failed processing- Elements are dictionaries containing two keys:
'error'
: Error message (string)'path'
: Path of the file that failed processing (string)
- Both keys' values are empty strings if processing succeeded
- Elements are dictionaries containing two keys:
status
(bool
): Processing statusTrue
: At least one file failed processingFalse
: All files processed successfully
Notes
- The lengths of
list1
andlist2
are the same. - If the API key does not have translation permissions, a
RuntimeError
exception will be thrown.
Warning
The return value of this function's list1
is different from other functions; see below for details.
Detailed Explanation of Return Values
The returned list1
contains two sublists:
text["texts"]
(list
): List of translated texts- Elements are translated texts (strings)
- Empty string indicates that the current text block was not translated (e.g., it is table text)
text["location"]
(list
): List of text location information- Elements are text location information (strings)
- Corresponds to each translated text in
text["texts"]
, indicating its position in the original PDF
Example
Warning
Please ensure you have configured your API key in environment variables as described in the Initialization section.
from pdfdeal import Doc2X
Client = Doc2X()
translate, fail, flag = Client.pdf_translate(
pdf_file="tests/pdf/sample.pdf", language="zh", model="deepseek"
)
for text in translate:
print(text["texts"])
print(text["location"])
print(fail)
print(flag)
Expected output, where dark areas represent printed variable values:
Processing file: 6% -- uuid: 655947fa-277c-4f05-8edc-b92f0eca3a63
TRANSLATE Progress: 1/1 files successfully processed.
['## 测试', '\n\n## 测试']
[{'raw_text': '## Test', 'page_idx': 0, 'page_width': 2040, 'page_height': 1148, 'x': 867, 'y': 418}, {'raw_text': '\n\n## 测试', 'page_idx': 1, 'page_width': 2040, 'page_height': 1148, 'x': 869, 'y': 412}]
[{'error': '', 'path': ''}]
False