When ញ៉ាំ meets ញ៉ំា

The Khmer script was added to the Unicode standard in September 1999. Today, nearly 18 years later, operating system renderers still get it wrong.

This is a quick post to document the difference in how several Khmer words are wrongly rendered on different current operating systems. I ran these tests on Windows 10 (10.0.14393), Android 7.1.1 Nougat, iOS 10.2.1, Mac OS X Sierra <> and Ubuntu 16.04 LTS with Firefox 47. The good news is that Windows 10 and Ubuntu passed all the tests (bar a font style issue with Leelawadee UI). Android passed nearly everything, except the bad encoding test.

Now, admittedly, the rules around triisap (U+17CA) and muusikatoan (U+17C9) are very complex. The Unicode standard description covers most of the difficulties, but not all of them.

Muusiaktoan is also sometimes called ធ្មេញកណ្ដុរ /tmɨɲ kɑndao/ – rat’s teeth, which is a fun name.

On to the words. In every case, the DauhPenh rendering is correct.

ញ៉ាំ /ɲam/ To eat

U+1789 U+17C9 U+17B6 U+17C6

ស៊ី /sii/ To eat (for young)

U+179F U+17CA U+17B8

As of Mac OS X Sierra, /sii/ now displays correctly. But contrast with /ʔum/, /ʔom/ below.

អ៊ំ /ʔum/, /ʔom/ Uncle, aunt

U+17A2 U+17CA U+17C6

Note how Leelawadee UI renders this wrongly; but that is a font rather than a renderer bug.

ប៊ី /bii/ A type of egg roll

U+1794 U+17CA U+17B8

ប៉ី /pəy/ A type of wind instrument

U+1794 U+17C9 U+17B8

As of Mac OS X Sierra, /pəy/ now displays correctly. But contrast with /bii/ above!

Yum yum yum

ញ៉ាំ /ɲam/ To eat

I’d like to pull out the word ញ៉ាំ for further analysis. Every operating system has some trouble with this word, because it could be encoded in several different ways. The correct way works on everything except iOS and Mac OS X. The incorrect encodings should really display wrongly, but none of the renderers complain about both invalid forms!

Correct order (ញ៉ាំ)

U+1789 U+17C9 U+17B6 U+17C6

Incorrect order (ញ៉ំា)

U+1789 U+17C9 U+17C6 U+17B6

Incorrect vowel (ញុំា)

U+1789 U+17BB U+17C6 U+17B6

In this instance, The DauhPenh rendering is appropriate for the first and second lines; the Apple rendering is ironically most appropriate for the third line!

Many thanks to Makara for his suggestion on the second incorrect rendering; I updated this post shortly after initial posting to include the extra example. There are other possible letter orders which may or may not display “correctly”; I will leave finding these as an exercise for the reader.

ZWNJ FTW

Here’s one I’ll examine in detail another time. Some words can be written in two different ways, neither really incorrect. The Unicode standard caters for these by allowing for insertion of a Zero Width Non Joiner (U+200C) to force the superscripted form of triisap (៊) or muusikatoan (៉). Windows 10’s Leelawadee UI font gets this one wrong (but its DauhPenh font doesn’t).

អ‌៊ី or អ៊ី /ʔii/ An exclamation of surprise

U+17A2 U+17CA U+17B8
ZWNJ
U+17A2 U+200C U+17CA U+17B8

Note: table ZWNJ character order corrected as per comment by Olivier Berten.

7 thoughts on “When ញ៉ាំ meets ញ៉ំា

  1. Looking at the 3 ways of encoding ɲam, I would suggest that only DauhPenh and Ubuntu handle them correctly. The first should render correctly. The second should render differently in order to indicate a difference in encoding. I don’t know whether this other order is actually an illegal ordering or not. If it is, then ideally there should be a dotted circle. But the important thing is that there is a contrast. But equally, for the third string, where it is a simple spelling error, there should be a correct rendering for that other spelling. I am assuming that there is no script rule by which a shaping engine should be expected to contrast this rendering. Are my assumptions correct do you think?

    1. Yes, I agree.

      According to the Unicode standard, chapter 16, “Two vowel signs, om and aam, have not been assigned independent code points. They are represented by the sequence of a vowel (U+17BB khmer vowel sign u and U+17B6 khmer vowel sign aa, respectively) and U+17C6 khmer sign nikahit.” So presumably then swapping nikahit and vowel sign aa is then an illegal sequence and should be differentiated.

      Therefore Leelawadee UI is also good enough, given the second string is not a valid sequence. But Android, iOS and Mac OS X are wrong.

      For the third string, in terms of Khmer language at least,U+17BB khmer vowel sign u and U+17B6 khmer vowel sign aa are never found together in a syllable. A good input method should correct this misspelling, but the renderer should distinguish anyway. I personally like Apple’s solution for this cluster: it clearly shows you are trying to put together two different vowels in the wrong way.

Leave a Reply

Your email address will not be published. Required fields are marked *