How to render combining marks consistently across platforms: a long story

I read today Richard Ishida’s notes on his experiences displaying combining characters in a consistent way across browsers, and recalled with fondness the pain we have experienced with this, both in browsers and in apps, over many years.

When we design a keyboard layout for Keyman, we will often want to show a diacritic mark or combining character from a complex script by itself on a key cap, or show the mark with a substituted placeholder symbol that is distinct from the script itself, to show how the combining character fits with the base letter.

A commonly used placeholder symbol is ◌ U+25CC DOTTED CIRCLE, but there are others that are preferred for some languages.

U+25CC

I show below three attempts at presenting the combining marks on our Open Source KeymanWeb platform keyboards, running in Chrome on Windows 8.1. The first shows a Lao keyboard, with each of the combining characters presented around a base  U+0E81 LAO LETTER KO on the keyboard. As you can see, this is not wonderful because the Ko Kai letter adds noise and makes it hard to pick out the combining characters.

Lao keyboard on keymanweb.com
Lao keyboard on keymanweb.com

The second keyboard is Thai. This relies on the system renderer and does not include a base character for the combining characters. In this case, that means that no base letter is shown and the marks show reasonably clearly. Unfortunately, this is not reliable across platforms, browsers or languages — it happens to work okay for Thai in Chrome on Windows 8.1.  On some other systems, the combining marks are shifted to the far left of the key cap, or even off the key entirely.

Thai keyboard on keymanweb.com
Thai keyboard on keymanweb.com

The third keyboard example is for Gujarati, and again relies on the system renderer and does not include a base character for the combining characters. Now this time you see that the Uniscribe renderer on Windows 8.1 automatically inserts a dotted circle glyph — this may or may not be the same as the U+25CC character. Again, this behaviour differs across platforms.

Gujarati keyboard on keymanweb.com
Gujarati keyboard on keymanweb.com

The Story Today

The story today is pretty unfortunate. Display of isolated combining marks is not implemented at all consistently across the plethora of rendering engines, operating systems, browsers and fonts. Some systems will automatically insert a placeholder glyph if no conforming base character is found prior to the combining character. Others don’t. What do I mean by a conforming base character? Well, that’s up to the implementation of the renderer. Some renderers will treat any base character that is not in the same script as the combining character as part of a separate run, and won’t attempt to combine. Others will combine with some outside-script characters but not others.

Things get even worse when working with multiple combining characters without a base character. Unless this has been specifically handled in the renderer, you will typically see poor layout or rendering of the characters that may not be representative of how they form above normal base characters.

The image below shows examples of the marks U+0EB5 LAO VOWEL SIGN II, U+0EB5 LAO VOWEL SIGN II + U+0EC8 LAO TONE MAI EK and U+0EB3 LAO VOWEL SIGN AM, rendered with and without U+25CC DOTTED CIRCLE, on various systems. A more comprehensive set of examples is included later in this blog.

Sample Lao combining character renders
Sample Lao combining characters rendered

You can see, in the image above, examples of:

  • Repeated dotted circles
  • Diacritics that don’t align over their base character correctly
  • Rendering errors leading to tofu
  • Diacritics that don’t stack
  • Marks that overlap incorrectly

None of the basic text-only solutions work consistently across all systems.

It’s like stuffing a live octopus into a box

Attempting to address the problem is not fun. Fixing the problem on one browser/renderer/OS/font system invariably creates a new problem on another system. It’s like stuffing a live octopus into a box: each time you manage to get one of those pesky tentacles into the box, two others have snuck out the other side.

A Solution

We worked hard to find a consistent cross-browser solution. We started with Lao, but we believe that the same principles can be applied to other scripts. The solution relies on web fonts, but falls back to an acceptable solution in the shrinking subset of systems where web fonts are not supported.

We started by creating a font that is designed for use specifically with the On Screen Keyboard — it would never be used for presentation or editing. While at first glance this meant we could easily solve the problem by placing all the required marks into the Private Use Area, doing so would mean that on systems where the font was not available, the keyboard would be unusable.

Our final solutionhack uses the OpenType GSUB feature to replace the  U+0E81 LAO LETTER KO with a placeholder mark when, and only when, it is used as a base character for a combining mark. The keyboard presentation code stores the Ko Kai letter as a base character for each combining character (or characters), and at render time the font substitutes its placeholder mark. When a key with this pair of characters is clicked, only the combining mark is displayed.

Lao keyboard with substituted base
Lao keyboard with substituted base

Doing this means we can use the Ko Kai letter by itself on the keyboard (on the D key) without a problem (for example on its own key), and in the case where the special font is not available, the user will see an acceptable fallback with the Ko Kai letter as the base character for combining characters.

It’s worth noting that many renderers require that the placeholder character be the same font and style as the combining characters. Today, this sadly precludes using a different colour for the placeholder character.

Examples

I have collected screenshots from a variety of browsers that illustrate some of the inconsistencies and frustrations, as well as the happily correct rendering. You can also visit the sample page to see how well the rendering works in your system.

The screenshots include three combinations:

  • A simple single U+0EB5 LAO VOWEL SIGN II
  • A pair of diacritic marks U+0EB5 LAO VOWEL SIGN II + U+0EC8 LAO TONE MAI EK
  • A third combining character U+0EB3 LAO VOWEL SIGN AM

The sample shows these three combinations with the default font for the system, a Lao OpenType font “Saysettha Web”, and the special font “SaysetthaX Web” which has our hack in it, along with three different base characters: a hard-coded U+25CC DOTTED CIRCLE, a U+00A0 NO-BREAK SPACE, and a U+0E81 LAO LETTER KO.

Initially, the dotted circle with the Saysettha Web font looks hopeful.  But in the end you will see that only one solution works on all browsers tested: the bottom right cell with the custom font and the Lao Letter Ko. The “hollow x” glyph that we use in this font is similar to the placeholder used in Lao language primers. However, any shape can of course be used here if you are creating your own font.

Windows 8.1 – Internet Explorer

Windows 8.1 - Internet Explorer 11

 

Internet Explorer 11 on Windows 8.1 does surprisingly badly here — worse than earlier versions of Windows.  Notice how the default Lao font (DokChampa) does not combine with dotted circle, nor does it combine the pair of diacritics correctly with no-break space.

Windows 8.1 – Chrome

Windows 8.1 - Chrome

 

While Chrome stacks the pair of diacritics correctly in all cases, the positioning for Dok Champa relative to the base character is incorrect for both dotted circle and no-break space. You can also see how the Saysettha Web and SaysetthaX Web fonts position their diacritics to the left of the cell in the case of the no-break space.

Windows 8.1 – Firefox

Windows 8.1 - Firefox

 

Firefox renderers similarly to Chrome. Note that it appears to be using Lao UI instead of DokChampa for the default font.

Windows 7 – Internet Explorer

Windows 7 - Internet Explorer 9

Windows 7 – Chrome

Windows 7 - Chrome

 

(Font smoothing was not enabled on the remote connection when I captured the screenshot)

Windows 7 – Firefox

Windows 7 - Firefox

(Font smoothing was not enabled on the remote connection when I captured the screenshot)

Windows XP – Firefox

Windows XP - Firefox

 

(Font smoothing was not enabled on the remote connection when I captured the screenshot)

Mac OS X – Safari

Mac OS X - Safari

 

Here we see the Saysettha Web solution with dotted circle, which worked consistently in Windows, fails in Mac OS X. Cross-platform support comes back to haunt us. Get back into that box, O tentacle!

Mac OS X – Chrome

Mac OS X - Chrome

Mac OS X – Firefox

Mac OS X - Firefox

It appears that Firefox is having trouble loading the Saysettha Web font, although the SaysetthaX Web font is loading, so this problem is probably resolvable.

iOS 8.1 – Safari

iOS 8.1 - Safari

 

This is at least consistent with the situation in Safari on Mac OS X 10.8!

 

Windows Phone 8.1 – Internet Explorer

Windows Phone 8.1 - Internet Explorer 11

 

This is similar, but not identical to Internet Explorer 9 on Windows 7. That would be too easy!

 

Android 4.3 – Android Browser

Android - Android Browser

 

Clearly, no Lao font was available on the system, so the web fonts were the only ones that worked.

Android 4.3 – Chrome

Android - Chrome

Clearly, no Lao font was available on the system, so the web fonts were the only ones that worked.

Android 4.3 – Firefox

Android - Firefox

Clearly, no Lao font was available on the system. It also appears that Firefox is having trouble loading the Saysettha Web font, although the SaysetthaX Web font is loading, so this problem is probably resolvable.

Conclusion

In summary, the only solution today for on screen keyboards (or character pickers) that works consistently across all browsers is to create a custom font with what can only be classed as a hack, and supply that font for use within those pickers / on screen keyboards.

I certainly agree with Richard that a more formalised solution based on U+00A0 no-break space or U+25CC dotted circle would be fantastic.

2 thoughts on “How to render combining marks consistently across platforms: a long story

  1. A number of fonts take an alternative approach: the default combining character glyph contains the dotted circle, dash or other placeholder, and OT rules substitute an altrrnative glyph and positioned in specific contexts. For some scripts this allows you to display isolated combining characters and visually indicate an invalid character sequence.

    1. Thanks for the comment 🙂 You are right, and that works well in a lot of cases. It has two drawbacks:

      1. The fallbackrendering where the font is not available is not predictable
      2. Multi-combining-character combinations (e.g. diacritic vowel + tone in Thai, Lao, etc) don’t render together (see Thai or Lao keyboard for examples of this).

      The visual indication of an invalid character sequence is valuable. Better than the black boxes that used to appear for Thai rendering in Windows!

Leave a Reply

Your email address will not be published. Required fields are marked *