How do I get Google Translate API?

Carla Ten Ventura

·
Follow

Sep 20, 2021

·
5 min read

Updating glossaries in Google Cloud Translation

How to progressively improve your translations without training any model

Infographic vector created by vectorjuice www.freepik.com

Google Translation API is a service offered by Google that allows you to integrate Google Translator into any app.

On November 2019, Google launched a set of new features the Translation API Advanced (v3) , to help users improve translations.

This article offers a view on how to use Glossaries, one of these new features and how to update glossaries with new terms.

Why glossaries?

Glossaries area meant to specify pairs of words that must be translated in a given way, specially when they are ambiguous or borrowed words or product names.

They are also a good tool to improve translations, specially when we work with minority languages. For example, the Catalan word home, which means man, is understood by Google as the English word home and, thus, translated into French as domicilie:

Wrong translation of home (man) from Catalan to French by Google Translate
Correct translation of home (man) from Catalan to French by wiktionary.org

In some cases, the use of glossaries becomes unavoidable (at least, if you dont have enough data or time or willingness to train your model).

A Glossary file contains at least two columns: one for the source language and another for the target language (for equivalent terms sets, you can also add a description column and another for part-of-speech). A glossary to specify that the word home in Catalan must be translated as homme (man) in French would look like this:

Example of a glossary file

Creating the glossary

The documentation on how to create and use glossaries is wide and exist a lot of tutorials.

This article focuses on unidirectional glossaries, but the API also accepts equivalent terms sets.

To use a glossary when using Google Translation API, you need to follow these steps:

  1. Create a project using the Google Cloud Console
  2. Grant the following roles to your Service Account:
    - Cloud Translation API Editor role to your service account (roles/cloudtranslate.editor)
    - Storage Object Viewer to your service account so it can access the glossary files that are stored in a Cloud Storage bucket (roles/storage.objectViewer)
  3. Create a glossary file. The file can be stored as CSV, TSV or TMX.
  4. Upload the glossary file to Cloud Storage. You can perform this step manually or by using any supported language.
  5. Create the glossary resource so it can be accessed by the translation API.
  6. Call the glossary when translating a sentence or a document.

Detailed functions for steps 4 to 6 are given in the next section.

Updating the glossary

During the use of an app that integrates the Translation API, one may find it useful to be able to add more terms to the glossary. However, when you read Googles documentation about glossaries, it is hard to find detailed instructions on how to update them.

It is difficult to find a tutorial that summarizes all the steps that one needs to follow in order to update a glossary, so we are putting them all together here. The steps are reproduced using Python, but all the used functions are available in Googles documentation in all the supported languages.

The functions that we will implement need the following information in order to locate the glossaries:

  • Project ID: the name of the project that you created in Google Cloud Console.
  • Project Number: the ID number associated to the project

Both project ID and number can be accessed in Project info in the Google Console:

Find your projects information in the Console
  • Glossary URI that identifies the glossary file in Cloud Storage.
  • Glossary name
  • Bucket name

The URI, the name of the file and the bucket name can be found by clicking to the file in the Buckets section of the Console:

Blob information in the Console
  • File name of the glossary (locally)
  • Blob name of the glossary (in the bucket).

The file name and the blob name can be the same, except for the fact that the file name may include the .csv (or .tmx or .tsv) termination and the blob name must not include it.

Once you have a new word to add to your glossary, update the glossary csv (or tsv or tmx) file:

Glossary ca-fr with a new word

Upload the file to Google Cloud Storage

Then, we need to upload it to Cloud Storage. As mentioned before, this step can be done manually or using any supported language. Since we want to automatize the whole updating process, we will do it from Python:

Delete the previous glossary resource

Once the file is uploaded to the Bucket, we will create a new glossary resource that can be accessed from the translation API. Before doing so, we need to delete the existing resource with the same name.

Create the glossary resource

Putting it all together

You can create a single function to easier update the glossary:

This is a good approach if you dont have enough data to train your own model when working with minority languages which Google still has a lot of room for improvement or when you are working with a domain specific topic.

Video

Postingan terbaru

LIHAT SEMUA