All posts by Marc Durdin

The Case of the Sadly Real Currently Failing Test

December 13, 2025Bugs, Computing, Unicode, WindowsMarc Durdin

In the Keyman project, we try to add automated tests for all the code we write — to make sure that we don’t break assumptions when we make future changes, and to verify that what we wrote actually works the way we expect it to.

Recently, I made a change to the Keyman for Windows Engine, removing some unused code, and cleaning up a function that tests whether the currently active Windows keyboard has specific support for the Right Alt (AltGr) key. This is important in Keyman, because, and I quote from the source code:

Background: when a Windows system keyboard that maps RightAlt (AltGr) is active, Windows will internally generate LeftCtrl+RightAlt events when the RightAlt key is pressed, so we need to recognize this and mask out the Left Control modifier state when it happens.

All well and good so far. When the system keyboard does support Right Alt, the KLLF_ALTGR flag is set in its metadata. And Keyman respects this flag so its keyboards function as expected for users.

I added a unit test for this change, to verify that reading this flag from the Windows system keyboard worked correctly. I chose to test against KBDUS (US English) and KBDA1 (Arabic 101), for the case where the KLLF_ALTGR flag is not set, and against KBDFR (French AZERTY) and KBDCZ (Czech), for the case where the KLLF_ALTGR flag was set.

The tests worked just fine, so I merged the pull request.

Fast forward to the 10th of December

The Keyman for Windows test builds can run on one of four different build agents, ba-win11-key-01 through -04. Builds had been running happily, but all of a sudden, the tests started failing for Keyman for Windows, for unrelated branches in the Keyman source, some of which didn’t even change Keyman for Windows code at all!

Even worse, the builds started failing on some agents but not on others, but over the course of the next day or so, all the agents started to fail!

The failure was always on the same test, in the unit test I had just added. For reference, the ReadAltGrFlagFromKbdDll function call in the failing test was being passed the Keyboard Layout ID (KLID) 00000401, which is the KLID for KBDA1.dll.

21:36:22  [----------] 1 test from RightAltEmulationCheck
21:36:22  [ RUN ] RightAltEmulationCheck.ReadAltGrFlagFromKbdDll
21:36:22  C:\BuildAgent\work\7ac43416c45637e9\keyman\windows\src\engine\keyman32\tests\RightAltEmulationCheck.tests.cpp(14): error: Expected equality of these values:
21:36:22  ReadAltGrFlagFromKbdDll("00000401")
21:36:22  Which is: 1
21:36:22  0
21:36:22  [ FAILED ] RightAltEmulationCheck.ReadAltGrFlagFromKbdDll (40 ms)
21:36:22  [----------] 1 test from RightAltEmulationCheck (40 ms total)

I started to worry that we were running into some sort of race condition, and wondered if this was causing issues for end users also. At this point, I had no idea what the root cause was. Can you guess?

I took on the task of investigating

As always, I opened an issue to track the problem.

First, I wanted to check that the build was running successfully on my machine. And surprisingly enough, it failed! Why? It was working a couple of days earlier.

Of course, this made it easier to investigate. Here was the failing test. I was checking that the AltGr flag was not set for kbda1.dll:

EXPECT_EQ(ReadAltGrFlagFromKbdDll("00000401"), FALSE); // kbda1.dll - Arabic 101

Clearly something had changed, somewhere.

The test was going wrong consistently, which meant that it probably wasn’t a race condition. I started by examining my assumptions about KBDA1.DLL. I took a look at the file on my machine:

marc@QUAMBY MINGW64 /d/Projects/keyman/app/windows/src/engine/keyman32 (master)
$ ls /c/windows/system32/kbda1.dll  -l
-rwxr-xr-x 2 marc 197609 24576 Dec 11 15:43 /c/windows/system32/kbda1.dll*

marc@QUAMBY MINGW64 /d/Projects/keyman/app/windows/src/engine/keyman32 (master)
$ ls /c/windows/system32/kbdus.dll  -l
-rwxr-xr-x 2 marc 197609 36864 Jun 12 20:53 /c/windows/system32/kbdus.dll*

Well, that was interesting! kbda1.dll had been updated, just one day earlier! Only one thing would have changed that — a Windows Update. And sure enough, there was an update.

I took a quick look at the Learn more link to Microsoft article KB5072033, but did not see anything immediately relevant. It was listed as a Security Update.

Digging into the binary

So I imported KBDA1.DLL into Keyman Developer, and sure enough, there was a new rule in the resulting .kmn file:

+ [RALT K_S] > U+20c1

What is U+20C1? It is a brand new currency symbol, the Saudi Riyal symbol:

The Saudi Riyal currency symbol was added to Unicode 17.0 in September 2025.

And that was the cause of the problem. Adding this one key to a new AltGr layer on the Windows KBDA1.DLL keyboard caused that the KLLF_ALTGR flag to be set for the keyboard, which caused my unit test to fail!

I already knew

Somewhat hilariously, I was involved in a discussion in the Unicode CLDR Keyboard Working Group way back in September 2025, where we talked about the Saudi Riyal symbol and where it should be placed on the Arabic keyboard — and I had learned at that time that AltGr+S was the chosen location! But I had forgotten all about that when I chose KBDA1.DLL as my victim for my unit test.

More evidence from Microsoft

Later, I returned to the Windows Update post. Buried in that article was a link to another article KB5070311, which had a Change Log section:

December 9, 2025: Update: This feature is included in the December 2025 non-security update (KB5070311).

[Keyboard] New! The AltGr layer is now enabled for the Arabic 101 keyboard layout. The left Alt key continues to function as before, while the right Alt key acts as a modifier to access additional symbols. The first new symbol mapped to AltGr on the Arabic 101 layout is the Saudi Riyal currency symbol (AltGr+S). The Saudi Riyal currency symbol is also available on the touch keyboard’s symbols page and the expressive input panels currency tab. Users who switch languages with Alt+Shift can continue to use the left Alt+Shift or the general shortcut Windows logo key+Spacebar. Arabic 102 and Arabic 102 AZERTY layouts are updated similarly.

Fixing the problem

I swapped out KBDA1.DLL for KBDTH0.DLL (Thai Kedmanee). The tests passed again. Huzzah!

I also took the opportunity to improve the function to separate failure from flag and test both separately:

  // kbdth0.dll - Thai Kedmanee
  EXPECT_EQ(ReadAltGrFlagFromKbdDll("0000041e", result), TRUE);
  EXPECT_EQ(result, FALSE);

Postmortem

The eagle-eyed among you will have spotted that I had violated a unit test principle here: I had a test that was dependent on the environment, which made it fragile. Now, I thought that my choice of environmental dependency was pretty safe, but it looks like I was wrong!

Sadly, my patch does not attempt to address this fragility, because it just swaps to another keyboard that could change at any time (but surely it won’t, right?) — because making the test more stable would require adding static fixtures to the codebase, and the cost just doesn’t seem worth it at this point.

And there we have it, the Case of the Saudi Riyal Currency Failing Test.

Next time it goes wrong, I’ll fix it properly I promise.

Ancient Mysteries and Private Detectives

June 3, 2025Computing, StoriesMarc Durdin

There I was, sitting at my desk in the late afternoon, chewing over the results of my last case. The case had gone well, but I needed more cases to keep the cash flowing. As I mulled over the possibilities, suddenly, I heard a scuffling noise, and saw a note shoved under my door. I ran over, threw the door open, and looked down the hall, but saw only the stairwell door closing. Too late. I picked up the note.

I am using the Hebrew keyboard with Keyman 9, but the furtive patach (as in ruach) seems to be missing. Is there a way to place the furtive patach under the right leg of the cheth, rather than just under the cheth (in between both legs)?

Who was the cheth?

This was intriguing, but immediately I had questions. Who was the cheth? And what was he doing with the furtive patach? Why did the cheth prefers to stand with its right leg on the patach. Is the cheth also looking furtive? I couldn’t wait to find out how this exciting story ended.

Thus motivated, I dived eagerly into the nearby library (Wikipedia) and discovered that the patach is looking furtive because it “stole an imaginary epenthetic consonant” (verbatim). I also discovered that the patach was stolen. Is it because the cheth has stolen the patach from the ruach? Or is the cheth actually innocent? Or harbouring a fugitive? And who owned the imaginary epenthetic consonant in the first place so that it could be stolen? Where was the imaginary epenthetic consonant now?

The European connection.

The following morning, as I was reading the Times (New Roman edition), I discovered an advertisement placed by a Mr Calibri which shed further light on the conundrum. I finally knew why the cheth kept the furtive patach between both legs. From the sound of his name, Mr Calibri was another European forcing his way into in this ancient Middle Eastern mystery with little understanding or subtlety.

This called for a visit to the Old City to consult Mr Ezra (SIL), an expert in ancient Biblical languages, who assured me that he would usually keep the patach, stolen or not, under his right leg.

This still seemed strange to me. But who was I to argue with this font of knowledge? I handed in my final case report, which I have reproduced below:

You should be able to type patach with the ^ key (Shift+6 on a US English keyboard). If you have the SIL Ezra font, the patach will be shown under the right hand leg of the cheth. The fonts you are currently using are not optimised for Biblical Hebrew but rather modern Hebrew, so I would recommend using SIL Ezra (which can be downloaded from https://software.sil.org/ezra/ if you don’t already have it).

Creating and using an Internationalized Domain Name in July 2024

July 30, 2024Bugs, Computing, Networking, UnicodeMarc Durdin

We needed to setup a web host for the domain convert.ភាសាខ្មែរ.com for an upcoming project. It should have been simple, but this turned into quite the journey.

A very brief background on IDNs

Some background. Notice the use of Khmer characters: ភាសាខ្មែរ /pʰiesaa kmae/ or “Khmer Language”. ភាសាខ្មែរ.com is a domain I’ve owned for some time, having purchased it when I was learning Khmer. Because this domain names uses letters not found in the English alphabet, it is considered to be an Internationalized Domain Name (or IDN). Now, behind the scenes, ភាសាខ្មែរ is not stored using Khmer characters in a Domain Name System (DNS) record. For reasons of history, domain names are restricted to a handful of Latin script letters, digits, and hyphen, and so the domain must be re-encoded using a system known as ‘punycode‘ (truly!)

The punycode representation of ភាសាខ្មែរ is the rather bamboozling xn--j2e7beiw1lb2hqg.

Now, most of the time, this gory detail will be hidden from you in the user interface of your browser — but if you copy the URL from the address bar and paste it into a text editor, you’ll be presented with the gory details! (The domain punycoder.com allows you to easily play with other scripts and non-English-alphabet names and see how they are punycoded.)

We needed a backend host with Python for the project, which ruled out a lot of free options — particularly since we wanted to bind to this custom domain name. As I had access to an Azure subscription already, I went that way.

So first, I attempted to create the domain quickly in the Azure Portal web front end. I had no trouble creating https://khconvert.azurewebsites.net/ — took just a few seconds. Then I had to bind our domain convert.ភាសាខ្មែរ.com to this azure hostname. But computer said no:

The name convert.xn--j2e7beiw1lb2hqg.com is not valid.

Fine, perhaps the UI needed the domain name as a Unicode string.

Hmm. Blocked again.

Now, often the Azure Portal does not allow you to do things which can be done via their APIs or with their az CLI tool. So I tried again with az.

PS> az webapp config hostname add --hostname convert.xn--j2e7beiw1lb2hqg.com --webapp-name khconvert --resource-group <REDACTED> --verbose
The name convert.xn--j2e7beiw1lb2hqg.com is not valid.
Command ran in 4.228 seconds (init: 0.493, invoke: 3.736)

Okay then … perhaps I have to use the non-punycode version of the hostname? After all, that error in the Azure Portal was purely in the UI — it didn’t even call into the API.

And it worked!

Side track: the old Powershell console did not do well with rendering Khmer in its default settings…

So it looks like those were not question marks? Interestingly, I can paste Khmer characters into the console, but I can’t copy them back out from the command — those letters copy out as question marks. Although as you can see from a screenshot and corresponding text later, I can copy the response and show it correctly. (Admittedly, I was using the old Powershell console, not Terminal, which probably would have none of these problems.)

But it worked, hurrah! We are done!

Well, not quite. We need a TLS certificate for the site. Fortunately, Azure supports creating certificates automatically for web apps.

But not for IDNs.

And not even the CLI saved me this time.

PS> az webapp config ssl create --hostname 'convert.?????????.com' --name khconvert -g <REDACTED> --verbose
This command is in preview and under development. Reference and support levels: https://aka.ms/CLI_refstatus
Operation returned an invalid status 'Bad Request'
Content: {"Code":"BadRequest","Message":"Properties.CanonicalName is invalid.  Canonical name convert.ភាសាខ្មែរ.com includes at least one special character. Only letters and numbers are allowed.","Target":null,"Details":[{"Message":"Properties.CanonicalName is invalid.  Canonical name convert.ភាសាខ្មែរ.com includes at least one special character. Only letters and numbers are allowed."},{"Code":"BadRequest"},{"ErrorEntity":{"ExtendedCode":"51021","MessageTemplate":"{0} is invalid.  {1}","Parameters":["Properties.CanonicalName","Canonical name convert.ភាសាខ្មែរ.com includes at least one special character. Only letters and numbers are allowed."],"Code":"BadRequest","Message":"Properties.CanonicalName is invalid.  Canonical name convert.ភាសាខ្មែរ.com includes at least one special character. Only letters and numbers are allowed."}}],"Innererror":null}

At this point, I threw my hands in the air, put the domain behind a Cloudflare proxy and used their SSL certificate, and it just worked.

But it is no wonder there are so few IDNs. The experience is still frightful. The errors are weird. We have a long, long way to go.

(Oh, and do take a look at ភាសាខ្មែរ.com and convert.ភាសាខ្មែរ.com: the problems they are playing with have quite a lot to do with IDNs too!)

dynamic import on node.js with circular dependencies leads to interesting failure modes

November 5, 2023Computing, Development, JavaScriptMarc Durdin

So, given these three ES modules below, what output do you expect to get when you run node one.js?

one.js:

import { two } from './two.js';

await two();

export function one() {
  console.log('one');
}

two.js:


export async function two() {
  console.log('two');
  const { three } = await import('./three.js');
  three();
}

three.js:

import { one } from './one.js';
export function three() {
  console.log('three');
  one();
}

Following through the calls, you’d expect to get:

two
three
one

But instead you get just the following output:

two

And node exits with exit code 13, which means “Unfinished Top-Level Await: await was used outside of a function in the top-level code, but the passed Promise never resolved.” There are no messages reported to the console, which is very disconcerting!

It turns out that the await import() causes this, because of the circular dependency introduced in module three.js, back to one.js. Now, if there is no async involved, then this circular dependency just works; that is, instead of using import('./three.js') you use import { three } from './three.js':

two
three
one

Not sure yet how to move forward with this. Tested in Node 20.9.0 and 18.14.1. Should I be raising it as a bug? It’s certainly not a documented behaviour that I’ve been able to find (node documentation, mdn documentation).

What is marriage?

November 28, 2021Bible, Family, Family|Bible|More, Pontificating, StoriesMarc Durdin

Written the day after my daughter’s wedding.

What is marriage?

Is it a record of property duly
Signed and sealed by stroke of dreary pen
So lawyers and judges can argue
Over who owns what

Or a walled garden within which
A family can grow and thrive
In the sun protected from
The fierce winds of society

Is it companionship walking
The Path of Life, two as one
Helping each other over
The stiles and through
The thickets and alongside
The peaceful streams

Could it be a sanctuary for love
Of mutual fascination and exploration
And closeness and raw nakedness
And risk
Of bodies freely given and
Trust held in fragile vessels
Of humanity

What if it was a reflection?
A reflection of Christ’s love
For his people and
His people’s love of Him
Sacrificial
Joyful
All consuming
Life giving

All this and more
Is found in the
Binding of two souls
What God has brought together
Let not man break asunder

Debugging a Windows Service

September 17, 2020Computing, Development, WindowsMarc Durdin

This is a set of notes on how to debug a Windows service starting up, mostly for my reference. Building on https://www.sysadmins.lv/retired-msft-blogs/alejacma/how-to-debug-windows-services-with-windbg.aspx with command line steps where possible.

In this example, we’ll be debugging mycool.exe, which has the service name mycoolservice.

Enabling debugging

Find the path to cdb.exe, windbg.exe, gflags.exe. (e.g. C:\Program Files (x86)\Windows Kits\10\Debuggers\x86).
Start an elevated command prompt. Set the service to manual start (and stop it if it is currently running, … duh):
```
sc config mycoolservice start=demand
sc stop mycoolservice
```

Find the short path for cdb.exe (pasting the path from point 1 as appropriate):

for %A in ("C:\Program Files (x86)\Windows Kits\10\Debuggers\x86\cdb.exe") do @echo %~sA

Enable the debug hook for the service, using gflags, replacing the path as necessary:

C:\PROGRA~2\WI3CF2~1\10\DEBUGG~1\x86\gflags /p /enable mycool.exe /debug "C:\PROGRA~2\WI3CF2~1\10\DEBUGG~1\x86\cdb.exe -server tcp:port=9999"

Change the service startup timeout to 1 hour to avoid Windows killing the service on startup:

reg add HKLM\System\CurrentControlSet\Control /v ServicesPipeTimeout /t REG_DWORD /d 3600000

Reboot, start an elevated command prompt again.
Start the service, which will appear to hang:
```
sc start mycoolservice
```
Open Windbg, Ctrl+R tcp:server=localhost,port=9999
Go forth and debug.

Disable debugging

Start an elevated command prompt, and enter the following commands:

C:\PROGRA~2\WI3CF2~1\10\DEBUGG~1\x86\gflags /p /disable mycool.exe
reg delete HKLM\System\CurrentControlSet\Control /v ServicesPipeTimeout

Reset the service startup parameters to your preferred startup type.
Reboot to reset the service control timeout.

location.hash can be re-entrant on Safari Mobile

May 11, 2018Bugs, Computing, Development, JavaScriptMarc Durdin

So Josh and I were observing some really strange behaviour in the Keyman for iOS beta. When typing rapidly on the touch keyboard, we would sometimes get the wrong character emitted. We could not see anything immediately wrong in the code. So Josh added some logging. Then things got really weird: we would get the start of a touch event, then before the touch event handler finished, we’d get logging that indicated another touch event was received.

Whoa. In JavaScript, that shouldn’t be possible, right?

Each message is processed completely before any other message is processed. This offers some nice properties when reasoning about your program, including the fact that whenever a function runs, it cannot be pre-empted and will run entirely before any other code runs (and can modify data the function manipulates). This differs from C, for instance, where if a function runs in a thread, it can be stopped at any point to run some other code in another thread.

— https://developer.mozilla.org/en-US/docs/Web/JavaScript/EventLoop

After we dug in further, we found that it was indeed possible for a new touch event (and perhaps others) to be received when location.hash was set from the JavaScript code, in Safari for iOS.

Here’s a fairly minimal repro. Load this up on your iPhone, and start rapidly touching the Whack div. You’ll probably need to use two fingers repeatedly to trigger the event (I can usually get it to happen with about 50-100 rapid touches). When it happens, you’ll get a log message with a call stack showing how the whackIt() function has apparently called itself!

<!doctype html>
<html>
  <head>
    <meta charset='utf8'>
    <meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1, user-scalable=no" /> 
    <title>location.hash re-entrancy</title>
    <style>
      #whack { font-size: 64pt }
      #log, #stack { font-family: courier }
    </style>
  </head>
  <body>
    <div id=whack>Whack</div>
    
    <div id=log></div>
    
    <div id=stack></div>
    
    <script>
    
      var tick = 0, inWhack = false;
      
      function whackIt(e) {
        e.preventDefault();

        tick++;
        
        if(inWhack) {
          stack.innerHTML = 'Re-entrant event: '+(new Error()).stack;
          return false;
        }
        
        var localTick = tick;
        log.innerHTML = tick;

        inWhack = true;

        // Setting location.hash seemingly can cause the 
        // event queue to be polled, resulting in a 
        // re-entrant touch event
        location.hash = '#'+localTick;

        inWhack = false;
        
        if(localTick != tick) {
          // alert('We had a re-entrant event');
        }
        
        return false;
      }
      
      whack.addEventListener('touchstart', whackIt, false);
      whack.addEventListener('touchend', whackIt, false);
      
    </script>
  </body>
</html>

We’ll be reporting this to Apple… Just a heads up.

The Case of the Overly Busy Process

March 13, 2018Bugs, Computing, Tools, WindowsMarc Durdin

The other day, I was running a routine Process Monitor (Procmon) trace to debug an issue in Keyman, when I noticed something strange: over 50% of the events displayed with the default filter (which excludes a lot of system-level noise and procmon-related feedback) were coming from a single process: services.exe.

You can see in the image below I’ve added services.exe to the filter (Process Name is services.exe), and then the status bar shows 52% of events belonging to it.

Puzzled, I set aside some time to dig a little further (which means I went to bed late one evening). Watching Process Explorer, I could see that services.exe and wmiprvse.exe were between them consuming about 10% of my CPU. This did not seem normal. Nor did it seem to be a good thing for my battery life.

Deciding to examine the trace a little, I filtered out common registry keys and events, such as RegCloseKey, which made it easier to spot a pattern. It became obvious that every 5 seconds, services.exe, with the help of wmiprvse.exe, would enumerate the list of services from the registry, sending about 120,000 events to the Procmon trace in the process. Nearly 80% of the events captured each minute by Procmon were generated by either services.exe or wmiprvse.exe!

Nearly 80% of the events captured each minute by Procmon were generated by either services.exe or wmiprvse.exe!

Given that wmiprvse.exe, the Windows Management Instrumentation (WMI) provider host, was involved, it seemed likely that there was a process issuing WMI queries against the Services provider, such as you can do with PowerShell:

Get-WmiObject Win32_Service | Format-Table Name, DisplayName, State, StartMode, StartName

It was just a matter of figuring out which one.

I started off by trying to dig into WMI logging. I don’t know if you’ve ever dug into that, but it’s huge, complex and somewhat impenetrable. It is likely that with the right knowledge I could have issued a command that gave me a list of queries being issued and who was issuing them. But I have not yet acquired that knowledge, sadly, and late at night my brain did not feel up to the attempt.

It seemed easier to instead to use a process of elimination of processes (yeah, I did that on purpose). I started the CPU monitor in Process Explorer for the services.exe process, which showed lovely 5 second spikes.

Then I started to stop various services, watching to see if the spiking stopped. It didn’t. Once I was down to a handful of critical services (do I really need to run the Firewall service?) I started looking at background user-level processes, such as the icons sitting in the System Notification Area.

And here I hit gold. After shutting down a few, including my own programs, with no noticeable change, I shutdown MySQL Notifier 1.1.7.

All of a sudden, CPU activity dropped to zero on the services.exe process, and the next Procmon trace showed a mere 85 events in a minute for the services.exe and wmiprvse.exe pair.

Success!

I checked the MySQL Notifier forums and saw no discussion of this issue, but I found a closed bug report in the bug database. I’ll have to add my comment to the bug report.

Once again, Procmon comes to the rescue 🙂 I’m looking forward to the increased battery life already!

I know it’s not the most elegant way to debug a problem, but sometimes it is quicker and easier than the alternatives. It’s especially easy to use process of elimination like this late at night, without having to think hard about it. 😉

The case for Keyman

March 10, 2018Computing, Development, UnicodeMarc Durdin

Recent podcast

I was recently privileged to be a guest on Scott Hanselman’s Hanselminutes podcast. It was great to talk about the history of Keyman and how it solved real problems for us 20+ years ago. But afterwards I realised I hadn’t really described what sorts of problems Keyman solves today and why it is still relevant.

So this post is that story.

Note: I’ve put this on my personal blog rather than the Keyman blog because it’s full of my own personal opinions. There’s lots of useful content on the Keyman blog as well!

The elevator pitch – and why I always struggled with it

A common story around venture capitalists and tech startups is that you have to have an amazing elevator pitch, the idea that within about 20-30 seconds, you can sell a problem and its solution to a person just begging to give you great wads of cash. And if you can’t describe the problem in 20 seconds then you probably don’t understand it yourself.

With all due respect, that might be fine for selling package drones or IoT nerf guns, but there are plenty of hard problems out there that cannot be understood in 20 seconds.

I think keyboard input for the world’s writing systems is one of those problems. For many people growing up with limited contact with Asian writing systems, they will look pretty and exotic. But that’s about the extent of it.

So, let’s imagine you and I are in an elevator, and just as you ask me to explain the benefit of Keyman to you, the power goes out and the elevator lurches to a halt. Now you are stuck with me and I can give you the extended elevator pitch!

An illustration from Khmer

Khmer is the national language of Cambodia, a small nation in South East Asia, home of one of the world’s ancient wonders, Angkor Wat.

Here’s a couple of samples of Khmer text. The first is a photo of an inscription from Angkor Wat.

This second image shows two common styles for writing Khmer, as computer fonts:

In a little bit, I’ll dig into how the writing system is used on computer. The script is beautiful, and it is complex, and it has a long and interesting history.

But first, I would like to tell you a story. I recently learned about an effort by a group of Cambodian linguists to revise and prepare a new edition of a Khmer language dictionary. The current “gold standard” Khmer dictionary, compiled by Samdech Porthinhean Chuon Nath, has not been updated since 1967. So a new edition is very much needed.

A number of different people have been involved in typing up entries for the dictionary. In the process, they discovered a problem: some entries which looked identical on screen were actually encoded differently in the computer. This meant that they were unable to consistently search for words, or even sort their dictionary correctly.

How had this happened? This happened because it is possible to type Khmer words in a number of different ways and get identical output on the screen. And I don’t mean by using different input methods, either. I mean with the standard Khmer keyboard layouts supplied with all major operating systems, desktop and mobile.

The Khmer Script

In this post, I won’t get too technical with lots of references to Unicode encodings. Instead, I’m trying to illustrate the issues from the point of view of a user, who shouldn’t need to worry about those details.

How would we go about typing a Khmer word? I’ve picked a famous Khmer word for us to start with:

This is the word ខ្មែរ /kmae/, Khmer, which is the name of the language and the people.

So, with the standard Khmer keyboard, how do you type this word?

Let’s imagine you had this word on a piece of paper and a standard Khmer keyboard in front of you. We can tell that it’s made up of a group of characters and some sort of diacritic marks.

Start with the first shape. It looks like a symbol and a diacritic mark. But the two marks belong together and make up a vowel sound, which we can find on the [SHIFT]+[E] key:

Let’s go with that.

Next we have the character and the diacritic mark. The first bit is easy enough to find, on the [X] key.

But that second bit? Okay, you won’t find that printed on a key cap. Fortunately, those using the Khmer keyboard tend to know their writing system. The character is a form of the /m/ consonant used when it is the second letter in a consonant cluster. It is called a ជើង /cəəŋ/ letter in Khmer. This means leg or foot, base or support. (Diversion: The Unicode charts call them COENG consonants. I bet you read that as CO-ENG, didn’t you. I know I used to!)

Okay, that’s all very interesting but how does one type these cəəŋ consonants? Well, the Unicode encoding standard added a special control character that’s not part of the writing system, called a KHMER SIGN COENG (U+17D2), just to support these letters. This is typed with a [J] key on the standard keyboard. Now we’re getting somewhere. So we’ll need to type [J] [M] to get that appearing under the .

Now before we go any further, this magic sign is a problem. It’s a problem because now the user has to know something about the way their text is encoded in order to type it. It’s not a particularly hard thing to learn, but it isn’t obvious. And this is one of the easier things that a native Khmer speaker needs to learn in order to type their own language.

The last letter is easy enough. Here it is on the [R] key.

The full sequence, first try

So the full sequence is: [SHIFT]+[E] [X] [J] [M] [R].

Let’s go ahead and type it.

Ugh. Where did that dotted circle come from?

The dotted circle is a visual hint. It tells us that the vowel is written before the consonant it is attached to, but that the encoding places it after.

Second try

Okay, so let’s change it to [X] [SHIFT]+[E] [J] [M] [R].

Okay, that looks correct now.

Well, okay, it looks fine … but it’s not. In order to meet the Unicode ordering rules, the vowel has to be placed after the entire consonant cluster, just as if it was spoken.

Third time lucky

Let’s go back and fix that again. Here’s our final sequence. [X] [J] [M] [SHIFT]+[E] [R].

This is the correct sequence.

Now from a linguistic point of view most of this seems fairly logical, and it generally makes sense to users, once they learn it. But I hope this illustrates how some of these things are not obvious. If you haven’t been taught these rules, or if you do accidentally mis-type something, you won’t ever know about it.

It’s not hard to find examples that have a heap of different ways you could type them. For example:

At first glance that doesn’t look a lot more complex, right? However, we’ve found 14 different ways to encode this in Unicode, all of which look correct on Android devices! And if you repeat diacritic marks, then you can expand that to over 35 different combinations, all displayed identically! My colleagues and I wrote a paper on the problem. Can you imagine searching an online dictionary with these kinds of problems?

Keyman: A Comprehensive Input Method Solution

So what does Keyman bring to this party? None of the system supplied Khmer keyboards make any attempt to quality control the input – generally they just output a single character for a single key.

Given we have well defined, structured linguistic and encoding rules, we can leverage those in the input method and ensure that sequences that are invalid according to the specification just don’t make it into the text store.

The power of Keyman

Keyman can automatically reorder and replace input as you type. For our example, if you type a vowel before typing a KHMER COENG SIGN and a consonant, Keyman will transparently and instantly fix the order in your document, and you won’t even know.

You can try a Khmer keyboard which automatically solves these encoding issues online at https://keymanweb.com/?tier=alpha#km,Keyboard_khmer_angkor

That, in a nutshell, is the power of Keyman.

Keyman makes possible amazingly intuitive input for complex writing systems.

Keyman is open source and completely free.

Keyman keyboard layouts are defined with a clear and easy to understand keyboard grammar, so that anyone can write a keyboard layout for their language. The development tools include visual editors, interactive debuggers and automated testing to help you develop sophisticated keyboard layouts.

Keyman runs everywhere. A keyboard layout can deployed to Windows, macOS, iOS, Android, Linux and even as a Javascript web keyboard. The Keyman project has a repository of over 700 existing keyboard layouts, and more are added every week.

Version 10 of Keyman is about to hit Beta. Would you like to be involved?

Oh look, the power is back on. Elevator pitch over. Oh by the way, aren’t writing systems cool?

Working around Delphi’s default grid scrolling behaviour

July 5, 2017Bugs, Computing, Delphi, DevelopmentMarc Durdin

Delphi’s T*Grid components have an annoying little feature whereby they will scroll the cell into view if you click on a partially visible cell at the right or the bottom of the window. Then, this couples with a timer that causes the scroll to continue as long as the mouse button is held down and the cell it is over is partially visible. This typically means that if a user clicks on a partially visible cell, they end up selecting a cell several rows or columns away from where they intended to click.

In my view, this is a bug that should be fixed in Delphi. I’m not the only person who thinks this. I’ve reported it to Embarcadero at RSP-18542.

In the meantime, here’s a little unit that works around the issue.

{
  Stop scroll on mousedown on bottom row of grid when bottom row
  is a partial cell: have to block both initial scroll and timer-
  based scroll.

  This code is pretty dependent on the implementation in Vcl.Grids.pas,
  so it should be checked if we upgrade to new version of Delphi.
}

{$IFNDEF VER320}
{$MESSAGE ERROR 'Check that this fix is still applicable for a new version of Delphi. Checked against Delphi 10.2' }
{$ENDIF}

unit ScrollFixedStringGrid;

interface

uses
  System.Classes,
  Vcl.Controls,
  Vcl.Grids,
  Winapi.Windows;

type
  TScrollFixedStringGrid = class(TStringGrid)
  private
    TimerStarted: Boolean;
    HackedMousedown: Boolean;
  protected
    procedure MouseDown(Button: TMouseButton; Shift: TShiftState; X: Integer;
      Y: Integer); override;
    procedure MouseMove(Shift: TShiftState; X: Integer; Y: Integer); override;
    function SelectCell(ACol, ARow: Longint): Boolean; override;
  end;

implementation

{ TScrollFixedStringGrid }

procedure TScrollFixedStringGrid.MouseDown(Button: TMouseButton;
  Shift: TShiftState; X, Y: Integer);
begin
  // When we first mouse-down, we know the grid has
  // no active scroll timer
  TimerStarted := False;

  // Call the inherited event, blocking the default MoveCurrent
  // behaviour that scrolls the cell into view
  HackedMouseDown := True;
  try
    inherited;
  finally
    HackedMouseDown := False;
  end;

  // Cancel scrolling timer started by the mousedown event for selecting
  if FGridState = gsSelecting then
    KillTimer(Handle, 1);
end;

procedure TScrollFixedStringGrid.MouseMove(Shift: TShiftState; X, Y: Integer);
begin
  // Start the scroll timer if we are selecting and mouse
  // button is down, on our first movement with mouse down
  if not TimerStarted and (FGridState = gsSelecting) then
  begin
    SetTimer(Handle, 1, 60, nil);
    TimerStarted := True;
  end;
  inherited;
end;


function TScrollFixedStringGrid.SelectCell(ACol, ARow: Longint): Boolean;
begin
  Result := inherited;
  if Result and HackedMousedown then
  begin
    // MoveColRow calls MoveCurrent, which
    // calls SelectCell. If SelectCell returns False, then
    // movement is blocked. But we fake it by re-calling with Show=False
    // to get the behaviour we want
    HackedMouseDown := False;
    try
      MoveColRow(ACol, ARow, True, False);
    finally
      HackedMouseDown := True;
    end;
    Result := False;
  end;
end;

end.

Marc Durdin's Blog

Thoughts on computing, Bible, cycling, family, my stories and more

All posts by Marc Durdin

The Case of the Sadly Real Currently Failing Test

Fast forward to the 10th of December

I took on the task of investigating

Digging into the binary

I already knew

More evidence from Microsoft

Fixing the problem

Postmortem

Ancient Mysteries and Private Detectives

Who was the cheth?

More questions than answers so far.

The European connection.

Creating and using an Internationalized Domain Name in July 2024

A very brief background on IDNs

dynamic import on node.js with circular dependencies leads to interesting failure modes

What is marriage?

Debugging a Windows Service

Enabling debugging

Disable debugging

location.hash can be re-entrant on Safari Mobile

The Case of the Overly Busy Process

The case for Keyman

Recent podcast

The elevator pitch – and why I always struggled with it

An illustration from Khmer

The Khmer Script

The full sequence, first try

Second try

Third time lucky

Other parts of the solution

Keyman: A Comprehensive Input Method Solution

The power of Keyman

Working around Delphi’s default grid scrolling behaviour