Horeaux & Stephen
International Inc.

Translations and computational linguistics since 1989

francais anglais allemand neermand italien espagnol portugais russe coreen chin japon

Professional system for translating websites and online content.

- The need to speed up the translation of French websites
- The obstacles to website translation
- Analysis of the documents to be translated
- Creation and maintenance of the linguistic baseline
- Use of the baseline
- Terminological checks
- Format management
- Specific case for files in HTML format
- Process
- Diagram
- Detail of the operations
- Text extraction
- Extraction format
- Translator relationship
- Human translation
- Website database construction
- page are created
- Page reintegration
- Updates
The strong points of the system
- A complete system
- A transparent system
- A fully automated system
- An economical system
- An optimum-quality translation
Freedom to choose translators


The need to speed up the translation of French websites
According to a study published by OC&C Strategy Consultants, in collaboration with Google, France showed a “commercial deficit” of EUR 700 million over the Internet in 2013, in a market of transnational flows assessed at four billion.
For the authors of the study, these results are explained by the lack of implementation means for French internet retailers. “Massive and rapid investment is required internationally in order to adapt to local markets. This means translating content into the language of the country concerned.”

The obstacles to website translation
This study, however, does not highlight the two major obstacles encountered by French websites:
- the lack of affinity on the part of the French for foreign languages
- the practical difficulty of translating websites

A website promotes the image of a company or an organisation. It is clear that its translation must remain the charge of professional translators rather than simply conceding to automatic translation
It also remains difficult to access the text contained in the web pages, and to use translation memory software so as to lower costs and guarantee quality
To this end, it is a good idea to combine into a single process the know-how of two professions: IT and translation, but computer-assisted human translation and not automatic translation


The principles governing the EASYWEB software are as follows:

- Neither the administrator nor the contributors are to work in the page code for managing a multilingual site
They are only involved in the update of text in the source language

- Nor are translators to work in the web page code, at the risk of modifying tags, missing text and being cut off from CAT tools
They only use Word (Microsoft Office text processing) as the interface

- The website must have its own translation memory
Each page, for each target language, must represent a database index

- The software flags the modifications made to a page: deletion, amendments to the text, additions
In the case of new text, the EASYWEB memory assists the translator

- The web page structure must not be altered by the software; the format tags (bold, italics, fonts, etc.) are referenced
These are regenerated when creating the pages in order to recreate in each target language the exact equivalent of the source language page

In compliance with these principles, EASYWEB’s fully automated process implements a certain number of functions.


The main functions of the software, as set out below, include several additional modules not described in this document.

- Analysis of the documents to be translated
Step that consists in neutralising the elements of no significance for translation: proper nouns, numerical data, etc
Reproduces the text already translated in the same context and content in the site’s translation memory. This step intends to reduce the volumes to be translated in an efficient fashion, and thus also the budgets
This function also allows for taking the measure of variance between several versions of a document and only working on the differences.

- Creation and maintenance of the linguistic baseline
The linguistic baseline is formed of segments, the smallest form of which is the glossary. This is comprised of noun phrases for which the minimum unit is one (1) non-ambiguous word
The baseline is created by compiling and assembling existing documents that have already been translated and which are provided by the client within the framework of a specific project
The baseline is enriched on every subsequent translation carried out, after validation by the client if the latter so wishes. This process must be automated (computerised) and checked by a translator who will guarantee integrity.

- Use of the baseline
The baseline allows glossaries to be produced
It manages computerized translations
Its traceability allows for document analysis

- Terminological checks
Before each translation is delivered, the document is compared to the glossary
Any discrepancy is flagged to the proofreader, who analyses this and makes any corrections. Should the agreement entered into with the client so specify, it will be sent a note if this is not the case.

- Format management
A specific treatment allows translators to work in a comfortable environment; in fact, text referencing and the interpretation of special characters renders working in HTML files almost impossible for translators

EASYWEB renders the text content accessible in a standard (Word) format in order to:
- allow for automatic pre-translation using the linguistic baseline
- allow the automatic glossary check
- make the proofreading operations easier
In HTML format files, it avoids re-entering and checking information. This entire process, extremely light in data processing terms, allows additional operating costs to be avoided



Detail of the operations
Page retrieval
Retrieval is carried out in two ways:
- Either by direct access to the website thanks to authorisations granted by the client
- Or by sending an HTML file by e-mail

Text extraction
The text to be translated is extracted from the tags and placed in a Word table column. The text segmentation obeys the tag structure. In the event the segments extracted have already been translated, the translations from the website database are placed into the Word table in the appropriate cell with regard to the segment concerned

Extraction format
The results of the extraction and pre-translation are thus placed into a three-column Word table. The first column is reserved for the software, the second for the extracted text, and the third for the pre-translation. The cells of this last column that remain empty and correspond to the updates. These form the subject of human translation and proofreading.
For example: Update to the translation of a French website into Chinese


Translator relationship

The software manages the directory of translators allocated to the translation of the website in accordance with their language and their specialties. A module automatically sends the Word tables for completion out by e-mail.

Human translation
The translator completes the Word table and sends it back to EASYWEB by e-mail.

Website database construction
The database is created and enriched as of the translation of the first page. Each translated Word table is stored, segment by segment. Each segment has traceability elements

Page are created
The software uses the database in order to recreate the website pages in HTML code. These are delivered with the original name, followed by an underscore and a suffix comprising the code for the target language. (For example, the page index.htm is delivered translated into English under the name index_en.htm). If one exists, the website’s directory structure is respected and recreated

Page reintegration
As for the retrieval of the sources pages, reintegration can be carried out in two ways:
- Either by direct access to the website thanks to authorisations granted by the client,
- Or by e-mail delivery of the translated pages, in their directory structure if applicable.

During updates following interventions by contributors, you simply need to indicate the page(s) concerned. After retrieval and extraction, these are compared to the website database, which will highlight the modifications made

The strong points of the system

A complete system
EASYWEB develops all the management processes of a multilingual website. The client only manages the updates; alignment into the target languages is automatic.

A transparent system
It allows the client to check the real number of words to be translated and, if it so desires, to have the content of the website translation memory (database of texts and their translations) available in Word or Excel format

A fully automated system
The entire process is fully automated: from extraction of the text, to dispatch to the translators, to creation of the translated pages
The only human intervention is limited to:
- retrieval of the pages indicated as being modified
- translation and proofreading of the new translations
- reloading the translated pages

An economical system
The cost of the translation is limited to new and non-redundant text

Optimum translation quality
Translations are carried out by professional translators working in their mother tongues which are the target languages, and specializing in the website’s field. The translation memory created for website updates guarantees optimum terminological quality.

EASYWEB is capable of extremely rapid analysis of large quantities of documents or document directories
Handling a site is very simple:
- the website pages are retrieved, in their directory structure, if applicable
- the pages are extracted to Word with direct deduplication of redundant elements
- the quotation is sent
- the Word tables are translated by the client’s or hs2i translators
- page are created
- the pages are delivered or reloaded
This process is repeated for each new target language
The webmaster must simply manage the links from the source website pages to the pages on the target website(s)

Freedom to choose translators

The client may choose to use hs2i translators. It may also opt for its own internal translators or those from another translation agency.
In these last two cases, the translators must comply with the instructions given by hs2i, which then acts as an IT service provider and manages the website translation memory.


Translation rate
- The rate is established with the client, per translated word. This varies depending on the language pair
- Only the new text is invoiced. New text should be understood as being a text segment not already present in the website’s translation memory

IT services rate
- The creation and management of the website translation memory is automated and does not raise any invoice
- The EASYWEB process (from retrieving the source page to delivery of the translated page) is invoiced at a rate per page
This depends on the complexity of the website and varies between EUR 1 and EUR 9. This reduces in accordance with the number of pages processed for each request

- CAT: computer-aided translation
- CMS: Content Management System, website design and maintenance software
- Contributor: person intervening in the website content
- HTML: Web page encoding language
- Segment: chain of characters representing text, of variable length
- Site translation memory: All the multilingual segments comprising a database
- Source language: language in which the site was created
- Tags: chains of characters used in HTML allowing for text formatting (character fonts, structuring the page, etc.)
- Target languages: languages into which the site is translated
- Translation memory software: software for managing, creating and administering translation memories. (Main functions: division of text into uniform multilingual segments, encryption allowing for their exact identification or close matches in order to meet requirements for translation, proposing glossaries and terminological checks)
- Web manager: responsible for the website content.
- Webmaster: technician working in the website development software and responsible for the web page codes.