Jul 26, 2017

For Project Managers and Translators

Japanese Orthographic Variants in CAT Tool Based Translation
and What You Can Do About It -- Part 1 of 2

by JLD Member Noriko Nevins,
ATA-Certified (E>J) Translator

This writing is for Project Managers unfamiliar with the intricacies of Japanese orthographic rules and Japanese translators who use CAT tools, especially SDL Trados Studio. In this article I identify the problems Japanese orthographic rules can pose in CAT-tool-based translation, and what can project managers and translators can do to ensure spelling consistency in the TM. I also describe some of the different tools available and highlight those I found the easiest and simplest to implement in terms of cost and steps involved.
Nearly three decades ago, the first CAT tools were introduced to the translation industry. Many translation agencies now utilize their choice of CAT tools in-house, and require freelance translators they work with to use the same tools. When these translation agencies send a new project package to a translator, they often lock segments for which translations already exist in their translation memory (TM). The idea is that by locking these segments, the segments will not get translated again unnecessarily, thus saving on translation cost.

In an ideal world, the translator needs to do nothing to the locked segments. However, in English-into-Japanese translation projects, I frequently find locked segments that contain the same words spelled differently, i.e., I find orthographic variants within the same pre-translated file. If a translator notices variants in the spellings, she must address them in the same way she would address with terminology inconsistencies: Alert the project manager, determine which spelling variant is desirable for the project and the end client, and correct all unwanted versions in the project.

Because no two Japanese persons spell all words completely in the same way unless they are given the same spelling guideline, and because orthographic rules have not been standardized in Japanese, this tends to happen quite frequently in into-Japanese translation projects. As I worked with many translators and clients’ reviewers over the years, I began to wonder whether spelling differences among Japanese individuals are much greater than those among users of other languages. I researched a little bit and found numerous academic papers touching the subject, noting that it poses a significant challenge to machine translation, search engine optimization, database building and searching, teaching Japanese as a second language, etc. One study shows Japanese orthographic variants found in books, websites and magazines make up nearly 10% of all morphemes in the text. [1] Another paper mentions that while most English orthographic variants tend to be rare, archaic or loan words, Japanese orthographic variants are frequently found in common words that are used today. It also found that number of variants pairs/sets in Japanese is over 50,000, which is twice the number of English orthographic variants. [2]

In the modern Japanese writing system, a word could be written with only hiragana, katakana, or in different combinations of kanji and hiragana, and they all may be commonly accepted spellings. Unlike some other languages, there is no reliable and unified standard spelling guideline for written Japanese. Furthermore, the version of spelling a person would use to write a specific word depends on the literacy and background of the individual, which is I think the compilation of the following factors:

·         Literary environment of the person’s family and community in which the individual was raised.
·         Since over time the Ministry of Education made changes to the use of kana and the kanji to be taught, the period during which the individual was in school in Japan.
·         Books, newspapers and magazine the person read growing up. Japanese newspapers and publishers enforce internal style guides within each company or publication. Fiction and non-fiction writers often have their own unique spelling preferences; one is influenced by the style of text one frequently reads.
·         The individual’s job history. A Japanese person adopts different spelling conventions depending on the industry, employer, and occupation in which she has worked.
·         Language changes (e.g. neologisms, spelling convention changes, etc.) reflecting changes of the times the person has been exposed to.

So, when multiple Japanese translators have been involved in an into-English translation project, spelling variants inevitably make their way into the TM unless a clear style guide is given to the translators and editors at the outset of the project. Here’s one simple example of spelling variants found in a translation project.  (I changed the sentences slightly for clarity.)

Example 1

Source Text (English)

Target Text (Japanese)
Thank you for your cooperation!
100% match, locked
Thank you for your time!
100% match, locked

The document was a survey of physicians. Here, the word “Thank you” (highlighted in yellow) is spelled differently in two segments found within same sheet. Neither of them is wrong. Both are accepted in Japan as correct spellings, although the hiragana-only version appears to be more commonly used nowadays. So, the Translator A may have been younger than Translator B. Both segments came from past translation projects that did go through review, approval and finalization processes. In this project, only two variants for “Thank you” were found in a single file. So, it was easy to unify them.

Here’s another example. This is about a medical term.

Example 2

Source Text (English)

Target Text (Japanese)
Carotid Ultrasound
100% match, locked
About Carotid Ultrasound
100% match, locked

An example like this can sometimes be found in a hospital’s patient guide. Different kanjis are used for the first character of the word “carotid.” In this case, Translator A’s is the formal version used by medical professionals. Translator B uses instead, which is a simplified version of the same kanji and used mostly by lay people. The translator who used the latter might have been used to using the simplified version, or the Japanese input system she was using happened to have 頚動脈 ahead of 頸動脈 in the order of kana-to-kanji conversion candidates. In this case too, it was easy to fix because there were only two variants.

The next example is not so simple. The end client was in international catering business and the project was to translate their menus. A tricky thing with the Japanese language is that most vegetables, fruits and seafood can be spelled in two or three different ways. Who would have thought food could make translators’ job so complicated! Below are examples of the food names that showed up as part of various dish names in pre-translated segments in a series of menus.

Example 3
Carrot (ninjin): にんじん, ニンジン, 人参
Cucumber (kyuuri): きゅうり, キュウリ, 胡瓜
Eggplant (nasu): なす, ナス, 茄子
Tuna (maguro): まぐろ, マグロ,
Shrimp (ebi): えび, エビ, 海老
Squid (ika): いか, イカ, 烏賊

How did all these variants end up in the TM? We could think of different possibilities. One is that the editor and the client reviewer involved in the past projects from which the TM was created or updated, may not always have been the same set of people, and may not have provided with any style guide or guidelines. So, every time a new member, especially a translator, joined the team, the member might have introduced new variants. It’s likely to have been unintentional, and the spelling choice might have been based on personal preference or just because she was simply used to using the specific spelling. Another possibility is that a single translator might have used more than one variant as a result of forgetting what variant she used previously. Or maybe the individual just decided to switch variants over the course of an ongoing project.

In cases like the examples above, if spelling inconsistencies go uncorrected, it would give end users of the translations an impression that the job was poorly done without much care or thought. It’s especially risky when a long document is divided among multiple translators and no unification of spellings is done before the document is delivered to the end client.

So, what could project managers and translators do to minimize this problem and improve TM integrity? I’ll discuss possible solutions in Part 2: Japanese Orthographic Variants in CAT Tool based Translation and What You Can Do About It.

[1] コーパスに基づく現代語表記のゆれの調査 BCCWJ コアデータを資料として 小椋秀樹(国立国語研究所言語資源研究系), Corpus-Based Survey of the Orthographic Variation in Contemporary Japanese: Analysis of the BCCWJ-Core
Hideki Ogura (Dept. Corpus Studies, NINJAL). Available at: https://www.ninjal.ac.jp/event/specialists/project-meeting/files/JCLWorkshop_no1_papers/JCLWorkshop2012_42.pdf [Accessed January 16th, 2017]
[2] Otmakhova, J, Orthographical variants in modern Japanese, Tomsk Polytechnic University. Available at: http://online.sfsu.edu/icplj/conference/ICPLJ6%20Papers/Otmakhova.pdf [Accessed January 16th, 2017]

