Japanese Orthographic
Variants in CAT Tool Based Translation
and What You Can Do About It -- Part 1 of 2
by JLD Member Noriko
Nevins,
ATA-Certified (E>J) Translator
This writing is for Project Managers unfamiliar with the intricacies
of Japanese orthographic rules and Japanese translators who use CAT tools,
especially SDL Trados Studio. In this article I identify the problems Japanese orthographic rules can pose in
CAT-tool-based translation, and what can project managers and translators can do to ensure
spelling consistency in the TM. I also describe some of the different tools available and highlight those I found the easiest and simplest
to implement in terms of cost and steps involved.
Part Two of "For Project Managers and Translators: Japanese Orthographic Variants in CAT Tools Based Translation and What You Can Do About It" will be posted later this year. However, look out for the full version of this article in the PDF version of the JLD Times coming soon!
Nearly three decades ago, the first CAT tools were
introduced to the translation industry. Many
translation agencies now utilize their choice of CAT tools in-house, and require
freelance translators they work with to use the same tools. When these translation
agencies send a new project package to a translator, they often lock segments
for which translations already exist in their translation memory (TM). The idea
is that by locking these segments, the segments will not get translated again
unnecessarily, thus saving on translation cost.
In an ideal world, the translator needs to do nothing to the
locked segments. However, in English-into-Japanese translation projects, I frequently
find locked segments that contain the same words spelled differently, i.e., I
find orthographic variants within the same pre-translated file. If a translator
notices variants in the spellings, she must address them in the same way she
would address with terminology inconsistencies: Alert the project manager,
determine which spelling variant is desirable for the project and the end
client, and correct all unwanted versions in the project.
Because no two
Japanese persons spell all words completely in the same way unless they are
given the same spelling guideline, and because orthographic rules have not been
standardized in Japanese, this tends to happen quite frequently in
into-Japanese translation projects. As I worked with many translators and
clients’ reviewers over the years, I began to wonder whether spelling differences
among Japanese individuals are much greater than those among users of other
languages. I researched a little bit and found numerous academic papers touching
the subject, noting that it poses a significant challenge to machine
translation, search engine optimization, database building and searching, teaching
Japanese as a second language, etc. One study shows Japanese orthographic
variants found in books, websites and magazines make up nearly 10% of all
morphemes in the text. [1] Another
paper mentions that while most English orthographic variants tend to be rare,
archaic or loan words, Japanese orthographic variants are frequently found in
common words that are used today. It also found that number of variants pairs/sets
in Japanese is over 50,000, which is twice the number of English orthographic
variants. [2]
In the modern Japanese writing system, a word could be written
with only hiragana, katakana, or in different combinations of kanji and
hiragana, and they all may be commonly accepted spellings. Unlike some other
languages, there is no reliable and unified standard spelling guideline for
written Japanese. Furthermore, the version of spelling a person would use to
write a specific word depends on the literacy and background of the individual,
which is I think the compilation of the following factors:
·
Literary environment
of the person’s family and community in which the individual was raised.
·
Since over time the Ministry of Education made
changes to the use of kana and the kanji to be taught, the period during which
the individual was in school in Japan.
·
Books, newspapers
and magazine the person read growing up. Japanese newspapers and publishers enforce
internal style guides within each company or publication. Fiction and
non-fiction writers often have their own unique spelling preferences; one is
influenced by the style of text one frequently reads.
·
The individual’s
job history. A Japanese person adopts different spelling conventions depending
on the industry, employer, and occupation in which she has worked.
·
Language
changes (e.g. neologisms, spelling convention changes, etc.) reflecting changes
of the times the person has been exposed to.
So, when multiple Japanese translators have been involved in
an into-English translation project, spelling variants inevitably make their
way into the TM unless a clear style guide is given to the translators and
editors at the outset of the project. Here’s one simple example of spelling
variants found in a translation project. (I changed the sentences slightly for clarity.)
Example 1
|
Source Text (English)
|
|
Target Text (Japanese)
|
A
|
Thank you for your cooperation!
|
100% match, locked
|
ご協力ありがとうございます。
|
B
|
Thank you for your time!
|
100% match, locked
|
お時間を割いていただき有難うございます。
|
The document was a survey of physicians. Here, the word
“Thank you” (highlighted in yellow) is spelled differently in two segments
found within same sheet. Neither of them is wrong. Both are accepted in Japan
as correct spellings, although the hiragana-only version appears to be more
commonly used nowadays. So, the Translator A may have been younger than Translator
B. Both segments came from past translation projects that did go through
review, approval and finalization processes. In this project, only two variants
for “Thank you” were found in a single file. So, it was easy to unify them.
Here’s another example. This is about a medical term.
Example 2
|
Source Text (English)
|
|
Target Text (Japanese)
|
A
|
Carotid Ultrasound
|
100% match, locked
|
頸動脈超音波検査
|
B
|
About Carotid Ultrasound
|
100% match, locked
|
頚動脈超音波検査について
|
An example like this
can sometimes be found in a hospital’s patient guide. Different kanjis are used
for the first character of the word “carotid.” In this case, Translator A’s 頸is the
formal version used by medical professionals. Translator B uses 頚 instead, which is a simplified
version of the same kanji and used mostly by lay people. The translator who
used the latter might have been used to using the simplified version, or the
Japanese input system she was using happened to have 頚動脈 ahead of 頸動脈 in the order of kana-to-kanji conversion
candidates. In this case too, it was easy to fix because there were only two
variants.
The next example is not
so simple. The end client was in international catering business and the
project was to translate their menus. A tricky thing with the Japanese language
is that most vegetables, fruits and seafood can be spelled in two or three
different ways. Who would have thought food could make translators’ job so complicated!
Below are examples of the food names that showed up as part of various dish
names in pre-translated segments in a series of menus.
Example 3
Carrot (ninjin): にんじん, ニンジン, 人参
Cucumber (kyuuri): きゅうり, キュウリ, 胡瓜
Eggplant (nasu): なす, ナス, 茄子
Tuna (maguro): まぐろ, マグロ, 鮪
Shrimp (ebi): えび, エビ, 海老
Squid (ika): いか, イカ, 烏賊
How did all these variants end up in the TM? We could think
of different possibilities. One is that the editor and the client reviewer involved
in the past projects from which the TM was created or updated, may not always have
been the same set of people, and may not have provided with any style guide or
guidelines. So, every time a new member, especially a translator, joined the
team, the member might have introduced new variants. It’s likely to have been
unintentional, and the spelling choice might have been based on personal
preference or just because she was simply used to using the specific spelling.
Another possibility is that a single translator might have used more than one
variant as a result of forgetting what variant she used previously. Or maybe
the individual just decided to switch variants over the course of an ongoing
project.
In cases like the examples above, if spelling
inconsistencies go uncorrected, it would give end users of the translations an
impression that the job was poorly done without much care or thought. It’s
especially risky when a long document is divided among multiple translators and
no unification of spellings is done before the document is delivered to the end
client.
So, what could project managers and translators do to
minimize this problem and improve TM integrity? I’ll discuss possible solutions
in Part 2: Japanese Orthographic Variants
in CAT Tool based Translation and What You Can Do About It.
[1] コーパスに基づく現代語表記のゆれの調査 ― BCCWJ コアデータを資料として ―小椋秀樹(国立国語研究所言語資源研究系),
Corpus-Based Survey of the Orthographic Variation in Contemporary Japanese:
Analysis of the BCCWJ-Core
Hideki Ogura (Dept.
Corpus Studies, NINJAL). Available at: https://www.ninjal.ac.jp/event/specialists/project-meeting/files/JCLWorkshop_no1_papers/JCLWorkshop2012_42.pdf [Accessed January 16th, 2017]
[2] Otmakhova, J, Orthographical variants in modern
Japanese, Tomsk Polytechnic University. Available at: http://online.sfsu.edu/icplj/conference/ICPLJ6%20Papers/Otmakhova.pdf [Accessed January 16th, 2017]
The Japanese language training at Kizoku is imparted by experienced and able instructors and trainers who have been making the dreams of many students true.
ReplyDeleteLearn Japanese language in delhi ncr