The Khmer script was added to the Unicode standard in September 1999. Today, nearly 18 years later, operating system renderers still get it wrong.
This is a quick post to document the difference in how several Khmer words are wrongly rendered on different current operating systems. I ran these tests on Windows 10 (10.0.14393), Android 7.1.1 Nougat, iOS 10.2.1, Mac OS X Sierra <> and Ubuntu 16.04 LTS with Firefox 47. The good news is that Windows 10 and Ubuntu passed all the tests (bar a font style issue with Leelawadee UI). Android passed nearly everything, except the bad encoding test.
Now, admittedly, the rules around triisap (U+17CA) and muusikatoan (U+17C9) are very complex. The Unicode standard description covers most of the difficulties, but not all of them.
Muusiaktoan is also sometimes called ធ្មេញកណ្ដុរ /tmɨɲ kɑndao/ – rat’s teeth, which is a fun name.
On to the words. In every case, the DauhPenh rendering is correct.
As of Mac OS X Sierra, /sii/ now displays correctly. But contrast with /ʔum/, /ʔom/ below.
Note how Leelawadee UI renders this wrongly; but that is a font rather than a renderer bug.
As of Mac OS X Sierra, /pəy/ now displays correctly. But contrast with /bii/ above!
Yum yum yum
I’d like to pull out the word ញ៉ាំ for further analysis. Every operating system has some trouble with this word, because it could be encoded in several different ways. The correct way works on everything except iOS and Mac OS X. The incorrect encodings should really display wrongly, but none of the renderers complain about both invalid forms!
Correct order (ញ៉ាំ)
Incorrect order (ញ៉ំា)
Incorrect vowel (ញុំា)
In this instance, The DauhPenh rendering is appropriate for the first and second lines; the Apple rendering is ironically most appropriate for the third line!
Many thanks to Makara for his suggestion on the second incorrect rendering; I updated this post shortly after initial posting to include the extra example. There are other possible letter orders which may or may not display “correctly”; I will leave finding these as an exercise for the reader.
Here’s one I’ll examine in detail another time. Some words can be written in two different ways, neither really incorrect. The Unicode standard caters for these by allowing for insertion of a Zero Width Non Joiner (U+200C) to force the superscripted form of triisap (៊) or muusikatoan (៉). Windows 10’s Leelawadee UI font gets this one wrong (but its DauhPenh font doesn’t).