Updated 30/1/2013: Fixed a typo and formatting issue or two. Added a sentence for clarity. See also the short follow-up post.
I’ve really struggled with how to frame this post. It could be about the dangers of WaitForSingleObject
and WaitForMultipleObjects
. Or about how Delphi’s TThread.Synchronize
seems so handy, and yet because it must use WaitForSingleObject
, is so fraught with complications. Or yet, about how pressing Alt+Left Shift to switch languages could hang an application. In the end, it’s about all three of these things.
Some people, when confronted with a problem, think, “I know, I’ll use threads,” and then two they hav erpoblesms.
Let’s start with TThread.Synchronize
. It seems a handy little function: it will, in a thread-safe manner, call a procedure in the context of the main VCL thread (the thread that owns the Delphi VCL GUI windows), and wait, using the Windows API WaitForSingleObject(INFINITE)
, for the procedure to return. Simpler than fiddling with synchronisation primitives, right? Everyone knows threading is hard, so use the well-tested thread utilities where you can?
Except that WaitForSingleObject
and its big brother WaitForMultipleObjects
are dangerous. The basic problem is that these calls can cause deadlocks, if you ever call them from a thread that has its own message loop and windows. That’s okay, you say, I don’t have any UI except in my main thread. But any thread that uses COM can have hidden COM helper windows when doing RPC (and see below for more on this). And other libraries can create their own windows as well, such as the ADO libraries.
So what causes the deadlock? Well, I’ll illustrate with the scenario we ran into. Some old code (that I wrote, okay, okay) created a thread (we’ll call it BackgroundThread). BackgroundThread used TThread.Synchronize
to periodically update the UI status about a background database process it was running. It doesn’t really matter what it was doing, but the use of Microsoft’s ADO database library meant that this thread was creating a hidden window, with the class ADODB.AsyncEventMessenger
. Behind the scenes, a second window was automatically created by Windows once we had the first window in the thread, and this one had the class name IME
.
Every now and then, our BackgroundThread would call Synchronize(RefreshStatus)
. This would signal an event which the main thread would check periodically from its message loop. Eventually it would call the RefreshStatus procedure. BackgroundThread would in the meantime have called WaitForSingleObject(INFINITE)
to wait for an event to be signaled by the main thread indicating that the RefreshStatus procedure had finished.
Where are we going? Well, if the main thread receives a message that it then decides to send on to other windows in the process, while BackgroundThread is getting ready to synchronize, we can end up in a deadlock. And, it turns out that in Windows XP, this can happen when WM_INPUTLANGCHANGEREQUEST
(0x50
) is received, e.g. when the user presses Alt+Left Shift. (Note, for this scenario to play, the Input Method Manager must be enabled — install Far Eastern language support in Windows XP). Remember, this is just one possible scenario which can trigger a deadlock.
Let’s pull this apart. I’ve loaded the stalled process into WinDbg, and am now looking at two call stacks which have deadlocks, in user mode. First the main VCL GUI thread:
0 Id: bf8.cb8 Suspend: 1 Teb: 7ffde000 Unfrozen
ChildEBP RetAddr
0012fadc 7e4194be ntdll!KiFastSystemCallRet
0012fb30 7e43652f USER32!NtUserMessageCall+0xc
0012fb50 7e418734 USER32!EditWndProcW+0x5d
0012fb7c 7e418816 USER32!InternalCallWinProc+0x28
0012fbe4 7e42a013 USER32!UserCallWinProcCheckWow+0x150
0012fc14 7e42a039 USER32!CallWindowProcAorW+0x98
0012fc34 004c0e7d USER32!CallWindowProcW+0x1b
0012fda4 004c0d80 audit4_home!Vcl.Controls.TWinControl.DefaultHandler+0xdd
0012fdf0 004c03d3 audit4_home!Vcl.Controls.TWinControl.WndProc+0x5b8
0012fe20 00467b3e audit4_home!Vcl.Controls.TWinControl.MainWndProc+0x2f
0012fe38 7e418734 audit4_home!System.ClassesStdWndProc+0x16
0012fe64 7e418816 USER32!InternalCallWinProc+0x28
0012fecc 7e4189cd USER32!UserCallWinProcCheckWow+0x150
0012ff2c 7e418a10 USER32!DispatchMessageWorker+0x306
0012ff3c 005a6980 USER32!DispatchMessageW+0xf
0012ff58 005a69c3 audit4_home!Vcl.Forms.TApplication.ProcessMessage+0xf8
0012ff7c 00d26d60 audit4_home!Vcl.Forms.TApplication.HandleMessage+0xf
0012ff9c 016e6903 audit4_home!S4s.Ui.Session.Appsession_main.TAppSession_Main.Run+0xcc
0012ffc0 7c817077 audit4_home!Audit4_home.initialization+0xc3
0012fff0 00000000 kernel32!BaseProcessStart+0x23
And then our BackgroundThread:
9 Id: bf8.f30 Suspend: 1 Teb: 7ffd5000 Unfrozen
ChildEBP RetAddr
0494fdb8 7c90df5a ntdll!KiFastSystemCallRet
0494fdbc 7c8025db ntdll!NtWaitForSingleObject+0xc
0494fe20 7c802542 kernel32!WaitForSingleObjectEx+0xa8
0494fe34 0042d4b3 kernel32!WaitForSingleObject+0x12
0494fe80 00408a19 audit4_home!System.Sysutils.WaitForSyncWaitObj+0x7
0494fed0 004654f2 audit4_home!System.TMonitor.Wait+0x25
0494fedc 00c9f3a2 audit4_home!System.Classes.TThread.Synchronize+0x2e
0494fef8 00c9ef29 audit4_home!S4s.Br.Backgroundclasses.Backgroundclass.TBackgroundClass.SetStatus+0x8a
0494ff70 00464b11 audit4_home!S4s.Br.Backgroundclasses.Backgroundclass.TBackgroundClass.Execute+0x18d
0494ffa0 00409752 audit4_home!System.Classes.ThreadProc+0x45
0494ffb4 7c80b729 audit4_home!SystemThreadWrapper+0x2a
0494ffec 00000000 kernel32!BaseThreadStart+0x37
We can see that the main thread has sent a message somewhere. It turns out it has sent a message to a window in the same thread (The window handle 0006040c is just the Edit window):
0012fb30 7e43652f 0006040c 00000050 00000001 USER32!NtUserMessageCall+0xc
0012fb50 7e418734 0006040c 00000050 00000001 USER32!EditWndProcW+0x5d
So why is it stalling? It’s hard to determine here, because everything bad is happening in kernel mode, behind that KiFastSystemCallRet
call. That must mean it’s time to step into kernel mode! In WinDbg, press Ctrl+K, select Local. I’m learning here, so this is as much for my own documentation as to explain to anyone else (in other words, if it doesn’t make sense to you, it won’t make sense to me, either, in 6 weeks time). First, we find the process details:
lkd> !process 0 0 audit4_home.exe
PROCESS 894c3710 SessionId: 0 Cid: 0bf8 Peb: 7ffdf000 ParentCid: 0e0c
DirBase: 0a5c0700 ObjectTable: 00000000 HandleCount: 0.
Image: audit4_home.exe
PROCESS 893277e0 SessionId: 0 Cid: 0d94 Peb: 7ffd8000 ParentCid: 0e0c
DirBase: 0a5c0720 ObjectTable: 00000000 HandleCount: 0.
Image: audit4_home.exe
PROCESS 8950d9a8 SessionId: 0 Cid: 0f10 Peb: 7ffdb000 ParentCid: 0e0c
DirBase: 0a5c0740 ObjectTable: 00000000 HandleCount: 0.
Image: audit4_home.exe
PROCESS 89566758 SessionId: 0 Cid: 0520 Peb: 7ffde000 ParentCid: 0e0c
DirBase: 0a5c0840 ObjectTable: e127a330 HandleCount: 467.
Image: audit4_home.exe
We’ll look at the last process listed, being the one with the problem (the other three are defunct):
lkd> !process 89566758 2
PROCESS 89566758 SessionId: 0 Cid: 0520 Peb: 7ffde000 ParentCid: 0e0c
DirBase: 0a5c0840 ObjectTable: e127a330 HandleCount: 467.
Image: audit4_home.exe
THREAD 8953a920 Cid 0520.009c Teb: 7ffdd000 Win32Thread: e1210eb0 WAIT: (Suspended) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1
8953aabc Semaphore Limit 0x2
THREAD 8963bda0 Cid 0520.0a98 Teb: 7ffd9000 Win32Thread: 00000000 WAIT: (Suspended) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1
8963bf3c Semaphore Limit 0x2
THREAD 8953b9a0 Cid 0520.03c8 Teb: 7ffd8000 Win32Thread: 00000000 WAIT: (Suspended) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1
8953bb3c Semaphore Limit 0x2
THREAD 895a2c08 Cid 0520.021c Teb: 7ffd7000 Win32Thread: 00000000 WAIT: (Suspended) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1
895a2da4 Semaphore Limit 0x2
THREAD 898f8cc8 Cid 0520.0bd8 Teb: 7ffd4000 Win32Thread: e2d75870 WAIT: (Suspended) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1
898f8e64 Semaphore Limit 0x2
THREAD 893287f8 Cid 0520.0eec Teb: 7ff4f000 Win32Thread: 00000000 WAIT: (Suspended) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1
89328994 Semaphore Limit 0x2
THREAD 89308c30 Cid 0520.00f0 Teb: 7ff4e000 Win32Thread: 00000000 WAIT: (Suspended) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1
89308dcc Semaphore Limit 0x2
THREAD 89ab4020 Cid 0520.01b0 Teb: 7ff4d000 Win32Thread: e2ed4260 WAIT: (Suspended) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1
89ab41bc Semaphore Limit 0x2
THREAD 893269b0 Cid 0520.0830 Teb: 7ff4c000 Win32Thread: e23bf2c0 WAIT: (Suspended) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1
89326b4c Semaphore Limit 0x2
THREAD 89895778 Cid 0520.0ce0 Teb: 7ff4b000 Win32Thread: e2fa06e8 WAIT: (Suspended) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1
89895914 Semaphore Limit 0x2
THREAD 8933ba80 Cid 0520.0e08 Teb: 7ff4a000 Win32Thread: 00000000 WAIT: (Suspended) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1
8933bc1c Semaphore Limit 0x2
THREAD 89322728 Cid 0520.0f28 Teb: 7ff49000 Win32Thread: 00000000 WAIT: (Suspended) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1
893228c4 Semaphore Limit 0x2
THREAD 89449960 Cid 0520.0ab0 Teb: 7ff48000 Win32Thread: e12d5590 WAIT: (Suspended) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1
89449afc Semaphore Limit 0x2
THREAD 8952d020 Cid 0520.09dc Teb: 7ffdc000 Win32Thread: 00000000 WAIT: (Executive) KernelMode Non-Alertable
SuspendCount 1
b8c197d4 SynchronizationEvent
And we can see the first thread has the address 8953a920
. So let’s look at that (flag 16
means show the full stack in the process context, with parameters).
lkd> !thread 8953a920 16
THREAD 8953a920 Cid 0520.009c Teb: 7ffdd000 Win32Thread: e1210eb0 WAIT: (Suspended) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1
8953aabc Semaphore Limit 0x2
Not impersonating
DeviceMap e1050c08
Owning Process 0 Image:
Attached Process 89566758 Image: audit4_home.exe
Wait Start TickCount 213083 Ticks: 67800 (0:00:17:39.375)
Context Switch Count 20200 NoStackSwap LargeStack
UserTime 00:00:00.781
KernelTime 00:00:01.484
Win32 Start Address audit4_home!Audit4_home.initialization (0x016e6840)
Start Address kernel32!BaseProcessStartThunk (0x7c810705)
Stack Init b7c1f000 Current b7c1e840 Base b7c1f000 Limit b7c19000 Call 0
Priority 10 BasePriority 8 PriorityDecrement 0 DecrementCount 16
ChildEBP RetAddr Args to Child
b7c1e858 80503864 8953a990 8953a920 804fb094 nt!KiSwapContext+0x2f (FPO: [Uses EBP] [0,0,4])
b7c1e864 804fb094 8953aa8c 8953a920 8953a954 nt!KiSwapThread+0x8a (FPO: [0,0,0])
b7c1e88c 80502fa0 00000000 00000005 00000000 nt!KeWaitForSingleObject+0x1c2 (FPO: [5,5,4])
b7c1e8a4 804ff8e0 00000000 00000000 00000000 nt!KiSuspendThread+0x18 (FPO: [3,0,0])
b7c1e8ec 80503882 00000000 00000000 00000000 nt!KiDeliverApc+0x124 (FPO: [3,10,0])
b7c1e904 804fb094 00000240 e1210eb0 00000000 nt!KiSwapThread+0xa8 (FPO: [0,0,0])
b7c1e92c bf802f15 00000000 0000000d 00000001 nt!KeWaitForSingleObject+0x1c2 (FPO: [5,5,4])
b7c1e968 bf835eb7 00000200 00000000 00000000 win32k!xxxSleepThread+0x192 (FPO: [3,5,4])
b7c1ea04 bf8141d2 bbe83720 00000287 00000019 win32k!xxxInterSendMsgEx+0x7f6 (FPO: [Non-Fpo])
b7c1ea50 bf80ecd9 bbe83720 00000287 00000019 win32k!xxxSendMessageTimeout+0x11f (FPO: [7,7,0])
b7c1ea74 bf92b42e bbe83720 00000287 00000019 win32k!xxxSendMessage+0x1b (FPO: [4,0,0])
b7c1eaa4 bf92c675 e2d75870 e13f10e0 e13f10e0 win32k!xxxImmActivateLayout+0x5b (FPO: [2,3,4])
b7c1ec08 bf8696d9 00000004 00000000 e13f10e0 win32k!xxxImmActivateThreadsLayout+0x10c (FPO: [3,82,4])
b7c1ec48 bf86862f e13f10e0 00000100 bbe88228 win32k!xxxInternalActivateKeyboardLayout+0xb7 (FPO: [3,8,4])
b7c1ec70 bf80b5a7 8989a2d0 04090c09 00000100 win32k!xxxActivateKeyboardLayout+0x4c (FPO: [4,3,0])
b7c1ecd4 bf80ec9f bbe88228 00000050 00000001 win32k!xxxRealDefWindowProc+0x56d (FPO: [4,16,0])
b7c1ecec bf81c176 bbe88228 00000050 00000001 win32k!xxxWrapRealDefWindowProc+0x16 (FPO: [5,0,0])
b7c1ed08 bf80eee6 bbe88228 00000050 00000001 win32k!NtUserfnNCDESTROY+0x27 (FPO: [7,0,0])
b7c1ed40 8054168c 0006040c 00000050 00000001 win32k!NtUserMessageCall+0xae (FPO: [7,3,0])
b7c1ed40 7c90e514 0006040c 00000050 00000001 nt!KiFastCallEntry+0xfc (FPO: [0,0] TrapFrame @ b7c1ed64)
0012fadc 7e4194be 7e428e0d 0006040c 00000050 ntdll!KiFastSystemCallRet (FPO: [0,0,0])
0012fb30 7e43652f 0006040c 00000050 00000001 USER32!NtUserMessageCall+0xc
0012fb50 7e418734 0006040c 00000050 00000001 USER32!EditWndProcW+0x5d (FPO: [4,0,4])
0012fb7c 7e418816 7e4364cf 0006040c 00000050 USER32!InternalCallWinProc+0x28
0012fbe4 7e42a013 00000000 7e4364cf 0006040c USER32!UserCallWinProcCheckWow+0x150 (FPO: [Non-Fpo])
0012fc14 7e42a039 7e4364cf 0006040c 00000050 USER32!CallWindowProcAorW+0x98 (FPO: [6,0,0])
0012fc34 004c0e7d 7e4364cf 0006040c 00000050 USER32!CallWindowProcW+0x1b (FPO: [5,0,0])
0012fda4 004c0d80 02940d99 02bb9430 fffffffe audit4_home!Vcl.Controls.TWinControl.DefaultHandler+0xdd
0012fdf0 004c03d3 0012fe04 004c03eb 0012fe20 audit4_home!Vcl.Controls.TWinControl.WndProc+0x5b8
0012fe20 00467b3e 00000050 00000001 04090c09 audit4_home!Vcl.Controls.TWinControl.MainWndProc+0x2f
0012fe38 7e418734 0006040c 00000050 00000001 audit4_home!System.ClassesStdWndProc+0x16
0012fe64 7e418816 02940d99 0006040c 00000050 USER32!InternalCallWinProc+0x28
0012fecc 7e4189cd 00000000 02940d99 0006040c USER32!UserCallWinProcCheckWow+0x150 (FPO: [Non-Fpo])
0012ff2c 7e418a10 0012ff60 00000000 0006040c USER32!DispatchMessageWorker+0x306 (FPO: [Non-Fpo])
0012ff3c 005a6980 0012ff60 00120100 0012ff9c USER32!DispatchMessageW+0xf (FPO: [1,0,0])
0012ff4c 7c910222 0000000f 028458e0 005a69c3 audit4_home!Vcl.Forms.TApplication.ProcessMessage+0xf8
0012ff9c 016e6903 0012ffb0 016e6916 0012ffc0 ntdll!RtlpAllocateFromHeapLookaside+0x42 (FPO: [Non-Fpo])
0012ffc0 7c817077 7c910222 0000000f 7ffde000 audit4_home!Audit4_home.initialization+0xc3
0012fff0 00000000 016e6840 00000000 00000018 kernel32!BaseProcessStart+0x23 (FPO: [Non-Fpo])
Highlighted above there is a call to win32k!xxxSendMessage
. This is passed the address of a window at bbe83720
. And when we look at that, we see the following:
lkd> dd bbe83720
bbe83720 000d01fe 00000006 e2d75870 8998ae50
And window 000d01fe
is the IME window for the other, deadlocked thread! Why is the kernel sending a message to another thread here and now? And without a timeout? That’s our deadlock. We know how it happens, but I at least still don’t know why!
Still, that’s a step forward. Next was to figure out why we were getting that. After a fair bit of exploration, I looked at little harder at the call stack, and decided to investigate the parameters for xxxActivateKeyboardLayout
:
8989a2d0 04090c09 00000100 win32k!xxxActivateKeyboardLayout+0x4c
That third parameter in xxxActivateKeyboardLayout
corresponds to the Flags
parameter for ActivateKeyboardLayout
. Perusing the documentation for the Windows ActivateKeyboardLayout
function, we can see that 0x100
is KLF_SETFORPROCESS
. Bingo! That sounds pretty suspicious! I worked through the assembly code for xxxActivateKeyboardLayout
and xxxRealDefWindowProc
, and sure enough, that was it: when xxxRealDefWindowProc
processes WM_INPUTLANGCHANGEREQUEST
, it sets the Flags
parameter to 0x0100
:
win32k!xxxRealDefWindowProc+0x44d:
8211f1dd 56 push esi
8211f1de 6800010000 push 100h
8211f1e3 ff7514 push dword ptr [ebp+14h]
8211f1e6 6a00 push 0
8211f1e8 e8dfe40100 call win32k!_GetProcessWindowStation (8213d6cc)
8211f1ed 50 push eax
8211f1ee e838bdf9ff call win32k!xxxActivateKeyboardLayout (820baf2b)
In Windows Vista and later versions, this problem does not arise as frequently, because the Text Services Framework takes over the Alt+Left Shift command (before DefWindowProc
gets a look-in) and calls ActivateKeyboardLayout
without the KLF_SETFORPROCESS
flag set. However, the issue can still arise if you ever use that flag in your own code when calling ActivateKeyboardLayout
.
To summarize, it’s helpful to remember that technically, we were using the functions as designed. The primary issue is combination of design flaws, first, with how that KLF_SETFORPROCESS
flag is handled: either the kernel code should be using a timeout when it sends the message to the other thread, or it should be queuing an event for the other thread to handle when it gets around to it; and second, with TThread.Synchronize
, which unfortunately by design cannot be robust against deadlocks.
In any case, to make your own code more robust:
- Determine, when using synchronisation calls, if there is any chance that windows could be created by your thread, and if so, use
MsgWaitForMultipleObjects
and a message loop instead of WaitForSingleObject
or WaitForMultipleObjects
. Check the libraries you are using, and if you are using COM, it’s safest to assume that windows will be created.
- Don’t use the
TThread.Synchronize
procedure. Recent versions of Delphi include TThread.Queue
, which is asynchronous, and so avoids this deadlock.
- Think carefully about whether a thread is the right solution to the problem.
- Don’t use the
KLF_SETFORPROCESS
flag!