Translate documentsCloud Translation - Advanced provides a Document Translation API for directly translating formatted documents such as PDF and DOCX. Compared to plain text translations, Document Translation preserves the original formatting and layout in your translated documents, helping you retain much of the original context like paragraph breaks. Show
The following sections describe how to translate documents and use Document Translation with other Cloud Translation - Advanced features like glossaries and AutoML Translation models. Document Translation supports both online and batch translation requests. For plain text and HTML translations, see Translating text. Supported file formatsDocument Translation support the following input file types and their associated output file types. InputsDocument MIME typeOutputDOCXapplication/vnd.openxmlformats-officedocument.wordprocessingml.documentDOCXPDF*application/pdfPDF, DOCXPPTXapplication/vnd.openxmlformats-officedocument.presentationml.presentationPPTXXLSXapplication/vnd.openxmlformats-officedocument.spreadsheetml.sheetXLSX*Document Translation supports both native and scanned PDF documents with some differences. For optimal format handling, use native PDF files when possible. Translating scanned PDF files results in some formatting loss. Complex PDF layouts can also result in some formatting loss, which can include data tables, multi-column layouts, and graphs with labels or legends. If you have PDF content in the DOCX or PPTX format, we recommend that you translate content by using those formats before converting them to PDFs. In general, Document Translation preserves a document's layout and style of DOCX and PPTX files better than PDF files. After a document translation, you can then convert the results to PDF files. Native and scanned PDF document translationsDocument Translation supports both native and scanned PDF files, including translations to or from right-to-left languages. Support for PDF to DOCX conversions is available for batch document translations on native PDF files only. Also, Document Translation preserves hyperlinks, font size, font color, and font style for native PDF files only (for both synchronous and batch translations). Before you beginBefore you can start using the Cloud Translation API, you must have a project that has the Cloud Translation API enabled, and you must have a private key with the appropriate credentials. You can also install client libraries for common programming languages to help you make calls to the API. For more information, see the Setup page. Required permissionsFor requests that require Cloud Storage access, such as batch Document Translation, you might require Cloud Storage permissions to read input files or send output files to a bucket. For example, to read input files from a bucket, you must have at least read object permissions (provided by the role roles/storage.objectViewer) on the bucket. For more information about Cloud Storage roles, see the Cloud Storage documentation. Translate documents (online)Online translation provides real-time processing (synchronous processing) of a single file. For PDFs, the file size can be up to 20 MB and up to 20 pages. For other document types, the file sizes can be up to 20 MB with no page limits. Translate a document from Cloud StorageThe following example translates a file from a Cloud Storage bucket and outputs the result to a Cloud Storage bucket. The response also returns a byte stream. You can specify the MIME type; if you don't, Document Translation determines it by using the input file's extension. If you don't specify a source language code, Document Translation detects the language for you. The detected language is included in the output in the detectedLanguageCode field. REST & CMD LINEBefore using any of the request data, make the following replacements:
HTTP method and URL: Request JSON body: To send your request, expand one of these options: curl (Linux, macOS, or Cloud Shell)Note: Ensure you have set the GOOGLE_APPLICATION_CREDENTIALS environment variable to the path for your service account private key file.Save the request body in a file called request.json, and execute the following command: -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \ -H "Content-Type: application/json; charset=utf-8" \ -d @request.json \ "https://translation.googleapis.com/v3/projects/PROJECT_NUMBER_OR_ID/locations/LOCATION:translateDocument " PowerShell (Windows)Note: Ensure you have set the GOOGLE_APPLICATION_CREDENTIALS environment variable to the path for your service account private key file.Save the request body in a file called request.json, and execute the following command: $headers = @{ "Authorization" = "Bearer $cred" } Invoke-WebRequest ` -Method POST ` -Headers $headers ` -ContentType: "application/json; charset=utf-8" ` -InFile request.json ` -Uri "https://translation.googleapis.com/v3/projects/PROJECT_NUMBER_OR_ID/locations/LOCATION:translateDocument " | Select-Object -Expand Content You should receive a JSON response similar to the following: Node.jsBefore trying this sample, follow the Node.js setup instructions in the Translation quickstart using client libraries. For more information, see the Translation Node.js API reference documentation. View on GitHub Feedback /**
* TODO(developer): Uncomment these variables before running the sample.
*/
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'global';
// const inputUri = 'path_to_your_file';
// Imports the Google Cloud Translation library
const {TranslationServiceClient} = require('@google-cloud/translate').v3beta1;
// Instantiates a client
const translationClient = new TranslationServiceClient();
const documentInputConfig = {
gcsSource: {
inputUri: inputUri,
},
};
async function translateDocument() {
// Construct request
const request = {
parent: translationClient.locationPath(projectId, location),
documentInputConfig: documentInputConfig,
sourceLanguageCode: 'en-US',
targetLanguageCode: 'sr-Latn',
};
// Run request
const [response] = await translationClient.translateDocument(request);
console.log(
`Response: Mime Type - ${response.documentTranslation.mimeType}`
);
}
translateDocument(); Translate a document inlineThe following example sends a document inline as part of the request. You must include the MIME type for inline document translations. If you don't specify a source language code, Document Translation detects the language for you. The detected language is included in the output in the detectedLanguageCode field. REST & CMD LINEBefore using any of the request data, make the following replacements:
HTTP method and URL: Request JSON body: To send your request, expand one of these options: curl (Linux, macOS, or Cloud Shell)Note: Ensure you have set the GOOGLE_APPLICATION_CREDENTIALS environment variable to the path for your service account private key file.Save the request body in a file called request.json, and execute the following command: -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \ -H "Content-Type: application/json; charset=utf-8" \ -d @request.json \ "https://translation.googleapis.com/v3/projects/PROJECT_NUMBER_OR_ID/locations/LOCATION:translateDocument " PowerShell (Windows)Note: Ensure you have set the GOOGLE_APPLICATION_CREDENTIALS environment variable to the path for your service account private key file.Save the request body in a file called request.json, and execute the following command: $headers = @{ "Authorization" = "Bearer $cred" } Invoke-WebRequest ` -Method POST ` -Headers $headers ` -ContentType: "application/json; charset=utf-8" ` -InFile request.json ` -Uri "https://translation.googleapis.com/v3/projects/PROJECT_NUMBER_OR_ID/locations/LOCATION:translateDocument " | Select-Object -Expand Content You should receive a JSON response similar to the following: PythonBefore trying this sample, follow the Python setup instructions in the Translation quickstart using client libraries. For more information, see the Translation Python API reference documentation. View on GitHub Feedback from google.cloud import translate_v3beta1 as translate
def translate_document(project_id: str, file_path: str):
client = translate.TranslationServiceClient()
location = "us-central1"
parent = f"projects/{project_id}/locations/{location}"
# Supported file types: https://cloud.google.com/translate/docs/supported-formats
with open(file_path, "rb") as document:
document_content = document.read()
document_input_config = {
"content": document_content,
"mime_type": "application/pdf",
}
response = client.translate_document(
request={
"parent": parent,
"target_language_code": "fr-FR",
"document_input_config": document_input_config,
}
)
# To output the translated document, uncomment the code below.
# f = open('/tmp/output', 'wb')
# f.write(response.document_translation.byte_stream_outputs)
# f.close()
# If not provided in the TranslationRequest, the translated file will only be returned through a byte-stream
# and its output mime type will be the same as the input file's mime type
print("Response: Detected Language Code - {}".format(response.document_translation.detected_language_code))
Use an AutoML model or a glossaryInstead of the Google-managed model, you can use your own AutoML Translation models to translate documents. In addition to specifying a model, you can also include a glossary to handle domain-specific terminology. If you specify a model or a glossary, you must specify the source language. The following example uses an AutoML model and a glossary. If the model or glossary are in a different project, you must have the corresponding IAM permission to access those resources. REST & CMD LINEBefore using any of the request data, make the following replacements:
HTTP method and URL: Request JSON body: To send your request, expand one of these options: curl (Linux, macOS, or Cloud Shell)Note: Ensure you have set the GOOGLE_APPLICATION_CREDENTIALS environment variable to the path for your service account private key file.Save the request body in a file called request.json, and execute the following command: -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \ -H "Content-Type: application/json; charset=utf-8" \ -d @request.json \ "https://translation.googleapis.com/v3/projects/PROJECT_NUMBER_OR_ID/locations/LOCATION:translateDocument " PowerShell (Windows)Note: Ensure you have set the GOOGLE_APPLICATION_CREDENTIALS environment variable to the path for your service account private key file.Save the request body in a file called request.json, and execute the following command: $headers = @{ "Authorization" = "Bearer $cred" } Invoke-WebRequest ` -Method POST ` -Headers $headers ` -ContentType: "application/json; charset=utf-8" ` -InFile request.json ` -Uri "https://translation.googleapis.com/v3/projects/PROJECT_NUMBER_OR_ID/locations/LOCATION:translateDocument " | Select-Object -Expand Content You should receive a JSON response similar to the following: Translate documents (batch)Batch translation allows you to translate multiple files into multiple languages in a single request. For each request, you can send up to 100 files with a total content size of up to 1 GB or 100 million Unicode codepoints, whichever limit is hit first. You can specify a particular translation model for each language. Translate multiple documentsThe following example includes multiple input configurations. Each input configuration is a pointer to a file in a Cloud Storage bucket. REST & CMD LINEBefore using any of the request data, make the following replacements:
HTTP method and URL: Request JSON body: To send your request, expand one of these options: curl (Linux, macOS, or Cloud Shell)Note: Ensure you have set the GOOGLE_APPLICATION_CREDENTIALS environment variable to the path for your service account private key file.Save the request body in a file called request.json, and execute the following command: -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \ -H "Content-Type: application/json; charset=utf-8" \ -d @request.json \ "https://translation.googleapis.com/v3/projects/PROJECT_NUMBER_OR_ID/locations/LOCATION:batchTranslateDocument " PowerShell (Windows)Note: Ensure you have set the GOOGLE_APPLICATION_CREDENTIALS environment variable to the path for your service account private key file.Save the request body in a file called request.json, and execute the following command: $headers = @{ "Authorization" = "Bearer $cred" } Invoke-WebRequest ` -Method POST ` -Headers $headers ` -ContentType: "application/json; charset=utf-8" ` -InFile request.json ` -Uri "https://translation.googleapis.com/v3/projects/PROJECT_NUMBER_OR_ID/locations/LOCATION:batchTranslateDocument " | Select-Object -Expand Content Node.jsBefore trying this sample, follow the Node.js setup instructions in the Translation quickstart using client libraries. For more information, see the Translation Node.js API reference documentation. View on GitHub Feedback /**
* TODO(developer): Uncomment these variables before running the sample.
*/
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'us-central1';
// const inputUri = 'path_to_your_files';
// const outputUri = 'path_to_your_output_bucket';
// Imports the Google Cloud Translation library
const {TranslationServiceClient} = require('@google-cloud/translate').v3beta1;
// Instantiates a client
const translationClient = new TranslationServiceClient();
const documentInputConfig = {
gcsSource: {
inputUri: inputUri,
},
};
async function batchTranslateDocument() {
// Construct request
const request = {
parent: translationClient.locationPath(projectId, location),
documentInputConfig: documentInputConfig,
sourceLanguageCode: 'en-US',
targetLanguageCodes: ['sr-Latn'],
inputConfigs: [
{
gcsSource: {
inputUri: inputUri,
},
},
],
outputConfig: {
gcsDestination: {
outputUriPrefix: outputUri,
},
},
};
// Batch translate documents using a long-running operation.
// You can wait for now, or get results later.
const [operation] = await translationClient.batchTranslateDocument(request);
// Wait for operation to complete.
const [response] = await operation.promise();
console.log(`Total Pages: ${response.totalPages}`);
}
batchTranslateDocument(); PythonBefore trying this sample, follow the Python setup instructions in the Translation quickstart using client libraries. For more information, see the Translation Python API reference documentation. View on GitHub Feedback
from google.cloud import translate_v3beta1 as translate
def batch_translate_document(
input_uri: str,
output_uri: str,
project_id: str,
timeout=180,
):
client = translate.TranslationServiceClient()
# The ``global`` location is not supported for batch translation
location = "us-central1"
# Google Cloud Storage location for the source input. This can be a single file
# (for example, ``gs://translation-test/input.docx``) or a wildcard
# (for example, ``gs://translation-test/*``).
# Supported file types: https://cloud.google.com/translate/docs/supported-formats
gcs_source = {"input_uri": input_uri}
batch_document_input_configs = {
"gcs_source": gcs_source,
}
gcs_destination = {"output_uri_prefix": output_uri}
batch_document_output_config = {"gcs_destination": gcs_destination}
parent = f"projects/{project_id}/locations/{location}"
# Supported language codes: https://cloud.google.com/translate/docs/language
operation = client.batch_translate_document(
request={
"parent": parent,
"source_language_code": "en-US",
"target_language_codes": ["fr-FR"],
"input_configs": [batch_document_input_configs],
"output_config": batch_document_output_config,
}
)
print("Waiting for operation to complete...")
response = operation.result(timeout)
print("Total Pages: {}".format(response.total_pages))
Translate and convert a native PDF fileThe following example translates and converts a native PDF file to a DOCX file. You can specify multiple inputs of various file types; they don't all have to be native PDF files. However, scanned PDF files cannot be included when including a conversion; the request is rejected and no translations are done. Only native PDF files are translated and converted to DOCX files. For example, if you include PPTX files, they are translated and returned as PPTX files. If you regularly translate a mix of scanned and native PDF files, we recommend that you organize them into separate Cloud Storage buckets. That way, when you request a batch translation and conversion, you can easily exclude the bucket that contains scanned PDF files instead of having to exclude individual files. REST & CMD LINEBefore using any of the request data, make the following replacements:
HTTP method and URL: Request JSON body: To send your request, expand one of these options: curl (Linux, macOS, or Cloud Shell)Note: Ensure you have set the GOOGLE_APPLICATION_CREDENTIALS environment variable to the path for your service account private key file.Save the request body in a file called request.json, and execute the following command: -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \ -H "Content-Type: application/json; charset=utf-8" \ -d @request.json \ "https://translation.googleapis.com/v3/projects/PROJECT_NUMBER_OR_ID/locations/LOCATION:batchTranslateDocument " PowerShell (Windows)Note: Ensure you have set the GOOGLE_APPLICATION_CREDENTIALS environment variable to the path for your service account private key file.Save the request body in a file called request.json, and execute the following command: $headers = @{ "Authorization" = "Bearer $cred" } Invoke-WebRequest ` -Method POST ` -Headers $headers ` -ContentType: "application/json; charset=utf-8" ` -InFile request.json ` -Uri "https://translation.googleapis.com/v3/projects/PROJECT_NUMBER_OR_ID/locations/LOCATION:batchTranslateDocument " | Select-Object -Expand Content Use an AutoML model or a glossaryInstead of the Google-managed model, you can use your own AutoML Translation models to translate documents. In addition to specifying a model, you can also include a glossary to handle domain-specific terminology. If you specify a model or a glossary, you must specify the source language. The following example uses an AutoML model and a glossary. You can specify up to 10 target languages with their own model and glossary. If you specify a model for some target languages and not others, Document Translation uses the Google-managed model for the unspecified languages. Similarly, if you specify a glossary for some target languages, Document Translation doesn't use any glossary for the unspecified languages. REST & CMD LINEBefore using any of the request data, make the following replacements:
HTTP method and URL: Request JSON body: To send your request, expand one of these options: curl (Linux, macOS, or Cloud Shell)Note: Ensure you have set the GOOGLE_APPLICATION_CREDENTIALS environment variable to the path for your service account private key file.Save the request body in a file called request.json, and execute the following command: -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \ -H "Content-Type: application/json; charset=utf-8" \ -d @request.json \ "https://translation.googleapis.com/v3/projects/PROJECT_NUMBER_OR_ID/locations/LOCATION:translateDocument " PowerShell (Windows)Note: Ensure you have set the GOOGLE_APPLICATION_CREDENTIALS environment variable to the path for your service account private key file.Save the request body in a file called request.json, and execute the following command: $headers = @{ "Authorization" = "Bearer $cred" } Invoke-WebRequest ` -Method POST ` -Headers $headers ` -ContentType: "application/json; charset=utf-8" ` -InFile request.json ` -Uri "https://translation.googleapis.com/v3/projects/PROJECT_NUMBER_OR_ID/locations/LOCATION:translateDocument " | Select-Object -Expand Content What's next
|