Challenges in the Translation of Languages with Non-Roman Script

Dr Yavar Dehghani

Languages with Non-Roman script including Arabic, Chinese, Cyrillic, Greek, Hebrew, Japanese, Korean, Persian, Dari, Tamil, and Thai have more challenges than others.

The translation of documents from English, a language with Roman script, into these languages can have challenges for the formatting of phrases, sentences, and tables, etc.

Some of these languages, like Persian, start on the right, read to the left and finish with a full stop at the far left which is opposite to English.

Even with the advances of word processing and formatting software, there are still many programs that cannot handle right to left text direction and scrambled texts are the common result, especially if there are any English word inserted in the LOTE text.

Microsoft Word handles these scripts quite well as long as there are not English words, no brackets, no full stops, no numbers and so on in the text and the text needs its setting to be right to left text direction and right justified.

The clients who ask for the translation of documents from English into these languages, do not know about these issues and when they format the text, they right justify it without changing the language direction. This can incorrectly place the full stop on the right and may also cause incorrect word order such as scrambling and spacing errors.

An example of right justifying the text while formatting is here in Persian, where the text, English words, numbers and punctuation have all been shifted and resulted in a scrambled paragraph:

The above paragraph is the scrambled version of the paragraph below:

The first paragraph is set to right to left text direction to display correctly. The second paragraph has been right justified without changing to R-L direction and is not correct. The giveaway is the full stop being on the right, not the left. In addition, the English words have caused the word order to become hopelessly jumbled. The text as displayed makes no sense.

This often happens when non-Roman text is copied into an English word file where the default paragraph settings are set for English. I.e. left to right. A full stop on the right is a clear sign that something is wrong.

Creating Right to Left layout with right justified text can be quite complex in some applications, like Excel spreadsheets and there can be many issues in copying and pasting the text into those programs and files, like this phrase:

The English or Persian words, as well as brackets move around in formatting and result in scrambling.

Another formatting problem is with spacing where the words attach together when formatting into other file types, especially PDF format, and make the work gibberish, for example, this sentence with proper spacing:

Becomes the illegible sentence below:

Another problem is with columns in a table or if a page is divided into columns. In Persian, the first column needs to be on the right, not the left as in English. In tables, the first column must also be on the right. When English has some text on the left with an image to the right of that, the LOTE would have the text on the right and image on the left.

The translators who work with an English file and translate that into LOTE will make these changes automatically, considering them part of their translation process. Thus, beside overtyping the English text, they need to change the display as needed to ensure everything is correctly laid out for the LOTE reader in the final files.

Clients who want to reformat a finalised translation need to make sure that any new layout or formatting they produce is still appropriate, especially when there are English words inside the text, if there are any columns and pictures. Sometimes, when they copy translated text into another file or application, the whole file becomes scrambled and they may have to do the layout from scratch.

Another problem involves English words: English words are often included in the text, especially when the client emphasises that not to translate those words or phrases and leave them in English. Therefore, originally, these words or phrases read from left to right within the text, as in English. The translators make sure these are correct in their English to LOTE translation. However, there can be problems if the text is then copied to another file. Firstly, the English alphabet encoding can be lost in the transfer. If this happens the words will display in the LOTE letters but the words won’t make sense and would be a scrambled letter collection.

Another issue is that they may correctly display in English letters, but the sentence word order may be wrong. The English words might display in the wrong (reverse) order and sometimes the English words can be correct but the surrounding Arabic words have a jumbled word order:

This sentence is scrambled and does not make any sense.

This is more common when the sentence starts with an English word and when there are phone numbers within the sentence, the numbers will jump back and forth.

Some clients, especially the government departments and agencies ask the translator to leave the proper names, organisations name in English. For example, “Medicare”, “Centrelink”, “Covid-19”, “Coronavirus” and so on. When these English words are inserted within a non-Roman script, they scramble the sentence or phrase, especially when the English word is at the start of the sentence, for example:


A solution for such languages is inserting transliteration of the words which is used instead of English words. Transliteration uses a unified sound and character system where unlike the alphabet system, each character corresponds to one and only one sound which makes it easier for any speaker of Non-Roman script languages to read the word or phrase in their own script.

For example, the above sentence will be:

The word “coronavirus” has been transliterated without translating but prevents the formatting problems.

The solution for most of these formatting problems is to translate or transliterate all English words into LOTE and if necessary, to leave the English word in the brackets. The other solution is to check the right and left justification at all times when moving files from Word to PDF or Excel as well as other formats.

When the client is familiar with the language script differences and have experience in formatting these languages, most of the problems are resolved. Otherwise, the translated text goes back and forth from the client to the translator to correct these different formatting errors in the text which sometimes become a nightmare for translators.

This article was written by Dr Yavar Dehghani, a language manager, self-published author, a linguist, and a lecturer in Iranian languages including Persian (Farsi & Dari), Pashto, and Turkic languages including Azeri and Turkish. His PhD is in General Linguistics from La Trobe University in Melbourne.

Currently, he is the Head of European & Middle Eastern, Chinese, Japanese & Korean languages in the Defence School of Languages in Melbourne.

