The Southern Ocean… According to Apple

September 21, 2012Bugs, Computing, Family|Bible|MoreMarc Durdin

Just like everyone else, I found Apple’s new train wreck release of iOS 6 maps bizarre. It’s rather unlike Apple to publish such an unpolished feature. The straight-line vector graphics look amateur when zoomed in and generally a step back from the Google maps. But worse than the lack of polish is of course the outright data errors. I can understand a misplaced town in an insignificant country such as the United Kingdom. But to misplace a whole ocean? I mean, it’s not like there’s that many of them to keep track of.

Here’s the evidence. You’ll also note the comparatively minor issue of a missing river. Minor compared to an ocean, anyway (on Google Maps):

However, before you hanker for the good old Google Maps, let’s take a quick look at the same location:

Looks good, river staying right where it belongs like a good little river, and the Southern Ocean has decided to skip town. But there’s something different about that Derwent Park Rd: it seems to be a major through-road on Google Maps but is a dead end on Apple’s map:

Google should know it’s not a major through road. Their Street View car even found the evidence. It’s a private road. With a gate.

What’s the moral of the story? Use Bing Maps? Nope. They include our little dried-up ocean, fortunately nameless, but also perpetuate that little issue with the closed road (and yes, both Bing and Google will happily direct you to crash through the gate, if you ask them for directions).

Still, I think Apple’s maps take home the prize!

How we told our girls about the new baby

September 20, 2012Family, Family|Bible|MoreMarc Durdin

A couple of months ago, we announced to our 5 and 8 year old daughters that Joey was pregnant. I decided to make the announcement a little more interesting for them, and here’s what we did. After a little initial preparation, I called the girls into the lounge room, and handed them an envelope. I told them that they had to solve the riddle in the envelope and that would tell them where to look for the next clue.

Great excitement! What an awesome game they both thought!

Without any further ado, the first riddle.

Gosh, that was a hard one to start with. We suggested they dissect the riddle, and figure out what else a ripple could be called. Bethany suggested a wave? A tiny tiny wave? A microwave? The girls rushed to the microwave, and sure enough, inside it, was an envelope, with two sheets of paper. We instructed the girls to put the first sheet aside, and figure out the next riddle:

This one was easier, and Hannah (our younger), clicked to it: the toilet duck! And there, near the toilet, was an envelope!

By this stage, both the girls were getting the hang of it and both making lots of suggestions, and it didn’t take either of them long to figure out that perhaps they should look in the car for the next clue!

And with this one, the only stumbling block they ran into, was which sleeping loft? They decided it couldn’t be theirs, because they had been playing in their room and hadn’t seen me coming in! And the final envelope was in our loft.

They rushed back down with the final envelope to the lounge room where we were waiting, and put the four other pieces of paper together in order:

Their immediate response was, “But Mummy’s not having a baby!” “I don’t believe it!” “You are just joking!” It took us several minutes to convince them that it was actually true!

That afternoon, we told our parents the news, and the girls wrote their own riddle clues (which have unfortunately since gone missing) and did their own message for their grandparents!

Delphi, Win32 and leaky exceptions

September 11, 2012Bugs, Computing, Delphi, Development, WindowsMarc Durdin

Have you ever written any Delphi code like the following snippet? That is, have you ever raised a Delphi exception in a Win32 callback? Or even just failed to handle potential exceptions in a callback?

    function MyWndEnumFunc(hwnd: HWND; lParam: LPARAM): BOOL; stdcall;
    begin
      if hwnd = TForm4(lParam).Handle then
        raise Exception.Create('Raise an exception just to demonstrate the issue');
      Result := True;
    end;

    procedure TForm4.FormCreate(Sender: TObject);
    begin
      try
        EnumWindows(@MyWndEnumFunc, NativeInt(Self));
      except
        on E:Exception do
          ShowMessage('Well, we unwound that exception.  But does Win32 agree?');
      end;
    end;

Raymond Chen has posted a great blog today about why that approach will eventually end in tears. You may get away with it for a while, or you may end up with horrific stack or heap corruption. The moral of the story is, any time you have a Win32 callback function, you need to make sure no exceptions leak. Like this:

function MyWndEnumFunc(hwnd: HWND; lParam: LPARAM): BOOL; stdcall;
begin
  try
    if hwnd = TForm4(lParam).Handle then
      raise Exception.Create('Raise an exception just to demonstrate the issue');
    Result := True;
  except
    // Handle the exception, perhaps pass it to Application.HandleException,
    // or log it, or abort your app.  Just don’t let it bubble through any
    // Win32 code.  You need to write the HandleAllExceptions function!
    HandleAllExceptions;
    Result := False;
  end;
end;

If you use Delphi’s AllocateHwnd procedure, remember that it also does not handle exceptions for you (I’ve just reported this in QualityCentral as QC108653 as this caveat should at least be documented). So you need to do it:

procedure TForm4.MyAllocatedWindowProc(var Message: TMessage);
begin
  try
    if Message.Msg = WM_USER then
      raise Exception.Create('Go wild!');
  except
    Application.HandleException(Self);  // Or whatever, just don’t let it leak
  end;
  with Message do Result := DefWindowProc(FMyHandle, Msg, wParam, lParam);
end;

procedure TForm4.FormCreate(Sender: TObject);
begin
  FMyHandle := AllocateHwnd(MyAllocatedWindowProc);
  SendMessage(FMyHandle, WM_USER, 0, 0);
end;

Why does switching languages hang my app?

August 31, 2012Bugs, Computing, Delphi, Development, Windows, Windows 7Marc Durdin

This is a very short follow-up to yesterday’s blog about WM_INPUTLANGCHANGEREQUEST, WaitForSingleObject and applications hanging.

While I thought, yesterday, that this bug was no longer such a problem on Windows Vista and Windows 7, it turns out that if you assign a hotkey to a language (e.g. Alt+Shift+2), the KLF_SETFORPROCESS flag will be set when that hotkey is pressed. This means that this issue is still a problem today.

This is not just a problem for Delphi. I’ve found references to Outlook hanging, .NET 4.5 WPF applications hanging, and VC++ applications hanging.

Moral of the story (again): thread-safe code is hard. Don’t believe that your code suddenly becomes thread-safe because you synchronize it to the main thread (no matter what the Delphi documentation tells you!)

WaitForSingleObject. Why you should never use it.

August 30, 2012Bugs, Computing, Delphi, Development, WindowsMarc Durdin

Updated 30/1/2013: Fixed a typo and formatting issue or two. Added a sentence for clarity. See also the short follow-up post.

I’ve really struggled with how to frame this post. It could be about the dangers of WaitForSingleObject and WaitForMultipleObjects. Or about how Delphi’s TThread.Synchronize seems so handy, and yet because it must use WaitForSingleObject, is so fraught with complications. Or yet, about how pressing Alt+Left Shift to switch languages could hang an application. In the end, it’s about all three of these things.

Some people, when confronted with a problem, think, “I know, I’ll use threads,” and then two they hav erpoblesms.

Let’s start with TThread.Synchronize. It seems a handy little function: it will, in a thread-safe manner, call a procedure in the context of the main VCL thread (the thread that owns the Delphi VCL GUI windows), and wait, using the Windows API WaitForSingleObject(INFINITE), for the procedure to return. Simpler than fiddling with synchronisation primitives, right? Everyone knows threading is hard, so use the well-tested thread utilities where you can?

Except that WaitForSingleObject and its big brother WaitForMultipleObjects are dangerous. The basic problem is that these calls can cause deadlocks, if you ever call them from a thread that has its own message loop and windows. That’s okay, you say, I don’t have any UI except in my main thread. But any thread that uses COM can have hidden COM helper windows when doing RPC (and see below for more on this). And other libraries can create their own windows as well, such as the ADO libraries.

So what causes the deadlock? Well, I’ll illustrate with the scenario we ran into. Some old code (that I wrote, okay, okay) created a thread (we’ll call it BackgroundThread). BackgroundThread used TThread.Synchronize to periodically update the UI status about a background database process it was running. It doesn’t really matter what it was doing, but the use of Microsoft’s ADO database library meant that this thread was creating a hidden window, with the class ADODB.AsyncEventMessenger. Behind the scenes, a second window was automatically created by Windows once we had the first window in the thread, and this one had the class name IME.

Every now and then, our BackgroundThread would call Synchronize(RefreshStatus). This would signal an event which the main thread would check periodically from its message loop. Eventually it would call the RefreshStatus procedure. BackgroundThread would in the meantime have called WaitForSingleObject(INFINITE) to wait for an event to be signaled by the main thread indicating that the RefreshStatus procedure had finished.

Where are we going? Well, if the main thread receives a message that it then decides to send on to other windows in the process, while BackgroundThread is getting ready to synchronize, we can end up in a deadlock. And, it turns out that in Windows XP, this can happen when WM_INPUTLANGCHANGEREQUEST (0x50) is received, e.g. when the user presses Alt+Left Shift. (Note, for this scenario to play, the Input Method Manager must be enabled — install Far Eastern language support in Windows XP). Remember, this is just one possible scenario which can trigger a deadlock.

Let’s pull this apart. I’ve loaded the stalled process into WinDbg, and am now looking at two call stacks which have deadlocks, in user mode. First the main VCL GUI thread:

   0  Id: bf8.cb8 Suspend: 1 Teb: 7ffde000 Unfrozen
ChildEBP RetAddr
0012fadc 7e4194be ntdll!KiFastSystemCallRet
0012fb30 7e43652f USER32!NtUserMessageCall+0xc
0012fb50 7e418734 USER32!EditWndProcW+0x5d
0012fb7c 7e418816 USER32!InternalCallWinProc+0x28
0012fbe4 7e42a013 USER32!UserCallWinProcCheckWow+0x150
0012fc14 7e42a039 USER32!CallWindowProcAorW+0x98
0012fc34 004c0e7d USER32!CallWindowProcW+0x1b
0012fda4 004c0d80 audit4_home!Vcl.Controls.TWinControl.DefaultHandler+0xdd
0012fdf0 004c03d3 audit4_home!Vcl.Controls.TWinControl.WndProc+0x5b8
0012fe20 00467b3e audit4_home!Vcl.Controls.TWinControl.MainWndProc+0x2f
0012fe38 7e418734 audit4_home!System.ClassesStdWndProc+0x16
0012fe64 7e418816 USER32!InternalCallWinProc+0x28
0012fecc 7e4189cd USER32!UserCallWinProcCheckWow+0x150
0012ff2c 7e418a10 USER32!DispatchMessageWorker+0x306
0012ff3c 005a6980 USER32!DispatchMessageW+0xf
0012ff58 005a69c3 audit4_home!Vcl.Forms.TApplication.ProcessMessage+0xf8
0012ff7c 00d26d60 audit4_home!Vcl.Forms.TApplication.HandleMessage+0xf
0012ff9c 016e6903 audit4_home!S4s.Ui.Session.Appsession_main.TAppSession_Main.Run+0xcc
0012ffc0 7c817077 audit4_home!Audit4_home.initialization+0xc3
0012fff0 00000000 kernel32!BaseProcessStart+0x23

And then our BackgroundThread:

   9  Id: bf8.f30 Suspend: 1 Teb: 7ffd5000 Unfrozen
ChildEBP RetAddr
0494fdb8 7c90df5a ntdll!KiFastSystemCallRet
0494fdbc 7c8025db ntdll!NtWaitForSingleObject+0xc
0494fe20 7c802542 kernel32!WaitForSingleObjectEx+0xa8
0494fe34 0042d4b3 kernel32!WaitForSingleObject+0x12
0494fe80 00408a19 audit4_home!System.Sysutils.WaitForSyncWaitObj+0x7
0494fed0 004654f2 audit4_home!System.TMonitor.Wait+0x25
0494fedc 00c9f3a2 audit4_home!System.Classes.TThread.Synchronize+0x2e
0494fef8 00c9ef29 audit4_home!S4s.Br.Backgroundclasses.Backgroundclass.TBackgroundClass.SetStatus+0x8a
0494ff70 00464b11 audit4_home!S4s.Br.Backgroundclasses.Backgroundclass.TBackgroundClass.Execute+0x18d
0494ffa0 00409752 audit4_home!System.Classes.ThreadProc+0x45
0494ffb4 7c80b729 audit4_home!SystemThreadWrapper+0x2a
0494ffec 00000000 kernel32!BaseThreadStart+0x37

We can see that the main thread has sent a message somewhere. It turns out it has sent a message to a window in the same thread (The window handle 0006040c is just the Edit window):

0012fb30 7e43652f 0006040c 00000050 00000001 USER32!NtUserMessageCall+0xc
0012fb50 7e418734 0006040c 00000050 00000001 USER32!EditWndProcW+0x5d

So why is it stalling? It’s hard to determine here, because everything bad is happening in kernel mode, behind that KiFastSystemCallRet call. That must mean it’s time to step into kernel mode! In WinDbg, press Ctrl+K, select Local. I’m learning here, so this is as much for my own documentation as to explain to anyone else (in other words, if it doesn’t make sense to you, it won’t make sense to me, either, in 6 weeks time). First, we find the process details:

lkd> !process 0 0 audit4_home.exe
PROCESS 894c3710  SessionId: 0  Cid: 0bf8    Peb: 7ffdf000  ParentCid: 0e0c
DirBase: 0a5c0700  ObjectTable: 00000000  HandleCount:   0.
Image: audit4_home.exe

PROCESS 893277e0  SessionId: 0  Cid: 0d94    Peb: 7ffd8000  ParentCid: 0e0c
DirBase: 0a5c0720  ObjectTable: 00000000  HandleCount:   0.
Image: audit4_home.exe

PROCESS 8950d9a8  SessionId: 0  Cid: 0f10    Peb: 7ffdb000  ParentCid: 0e0c
DirBase: 0a5c0740  ObjectTable: 00000000  HandleCount:   0.
Image: audit4_home.exe

PROCESS 89566758  SessionId: 0  Cid: 0520    Peb: 7ffde000  ParentCid: 0e0c
DirBase: 0a5c0840  ObjectTable: e127a330  HandleCount: 467.
Image: audit4_home.exe

We’ll look at the last process listed, being the one with the problem (the other three are defunct):

lkd> !process 89566758 2
PROCESS 89566758  SessionId: 0  Cid: 0520    Peb: 7ffde000  ParentCid: 0e0c
DirBase: 0a5c0840  ObjectTable: e127a330  HandleCount: 467.
Image: audit4_home.exe

THREAD 8953a920  Cid 0520.009c  Teb: 7ffdd000 Win32Thread: e1210eb0 WAIT: (Suspended) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1
8953aabc  Semaphore Limit 0x2

THREAD 8963bda0  Cid 0520.0a98  Teb: 7ffd9000 Win32Thread: 00000000 WAIT: (Suspended) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1
8963bf3c  Semaphore Limit 0x2

THREAD 8953b9a0  Cid 0520.03c8  Teb: 7ffd8000 Win32Thread: 00000000 WAIT: (Suspended) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1
8953bb3c  Semaphore Limit 0x2

THREAD 895a2c08  Cid 0520.021c  Teb: 7ffd7000 Win32Thread: 00000000 WAIT: (Suspended) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1
895a2da4  Semaphore Limit 0x2

THREAD 898f8cc8  Cid 0520.0bd8  Teb: 7ffd4000 Win32Thread: e2d75870 WAIT: (Suspended) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1
898f8e64  Semaphore Limit 0x2

THREAD 893287f8  Cid 0520.0eec  Teb: 7ff4f000 Win32Thread: 00000000 WAIT: (Suspended) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1
89328994  Semaphore Limit 0x2

THREAD 89308c30  Cid 0520.00f0  Teb: 7ff4e000 Win32Thread: 00000000 WAIT: (Suspended) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1
89308dcc  Semaphore Limit 0x2

THREAD 89ab4020  Cid 0520.01b0  Teb: 7ff4d000 Win32Thread: e2ed4260 WAIT: (Suspended) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1
89ab41bc  Semaphore Limit 0x2

THREAD 893269b0  Cid 0520.0830  Teb: 7ff4c000 Win32Thread: e23bf2c0 WAIT: (Suspended) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1
89326b4c  Semaphore Limit 0x2

THREAD 89895778  Cid 0520.0ce0  Teb: 7ff4b000 Win32Thread: e2fa06e8 WAIT: (Suspended) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1
89895914  Semaphore Limit 0x2

THREAD 8933ba80  Cid 0520.0e08  Teb: 7ff4a000 Win32Thread: 00000000 WAIT: (Suspended) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1
8933bc1c  Semaphore Limit 0x2

THREAD 89322728  Cid 0520.0f28  Teb: 7ff49000 Win32Thread: 00000000 WAIT: (Suspended) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1
893228c4  Semaphore Limit 0x2

THREAD 89449960  Cid 0520.0ab0  Teb: 7ff48000 Win32Thread: e12d5590 WAIT: (Suspended) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1
89449afc  Semaphore Limit 0x2

THREAD 8952d020  Cid 0520.09dc  Teb: 7ffdc000 Win32Thread: 00000000 WAIT: (Executive) KernelMode Non-Alertable
SuspendCount 1
b8c197d4  SynchronizationEvent

And we can see the first thread has the address 8953a920. So let’s look at that (flag 16 means show the full stack in the process context, with parameters).

lkd> !thread 8953a920 16
THREAD 8953a920  Cid 0520.009c  Teb: 7ffdd000 Win32Thread: e1210eb0 WAIT: (Suspended) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 1
8953aabc  Semaphore Limit 0x2
Not impersonating
DeviceMap                 e1050c08
Owning Process            0       Image:
Attached Process          89566758       Image:         audit4_home.exe
Wait Start TickCount      213083         Ticks: 67800 (0:00:17:39.375)
Context Switch Count      20200  NoStackSwap    LargeStack
UserTime                  00:00:00.781
KernelTime                00:00:01.484
Win32 Start Address audit4_home!Audit4_home.initialization (0x016e6840)
Start Address kernel32!BaseProcessStartThunk (0x7c810705)
Stack Init b7c1f000 Current b7c1e840 Base b7c1f000 Limit b7c19000 Call 0
Priority 10 BasePriority 8 PriorityDecrement 0 DecrementCount 16
ChildEBP RetAddr  Args to Child
b7c1e858 80503864 8953a990 8953a920 804fb094 nt!KiSwapContext+0x2f (FPO: [Uses EBP] [0,0,4])
b7c1e864 804fb094 8953aa8c 8953a920 8953a954 nt!KiSwapThread+0x8a (FPO: [0,0,0])
b7c1e88c 80502fa0 00000000 00000005 00000000 nt!KeWaitForSingleObject+0x1c2 (FPO: [5,5,4])
b7c1e8a4 804ff8e0 00000000 00000000 00000000 nt!KiSuspendThread+0x18 (FPO: [3,0,0])
b7c1e8ec 80503882 00000000 00000000 00000000 nt!KiDeliverApc+0x124 (FPO: [3,10,0])
b7c1e904 804fb094 00000240 e1210eb0 00000000 nt!KiSwapThread+0xa8 (FPO: [0,0,0])
b7c1e92c bf802f15 00000000 0000000d 00000001 nt!KeWaitForSingleObject+0x1c2 (FPO: [5,5,4])
b7c1e968 bf835eb7 00000200 00000000 00000000 win32k!xxxSleepThread+0x192 (FPO: [3,5,4])
b7c1ea04 bf8141d2 bbe83720 00000287 00000019 win32k!xxxInterSendMsgEx+0x7f6 (FPO: [Non-Fpo])
b7c1ea50 bf80ecd9 bbe83720 00000287 00000019 win32k!xxxSendMessageTimeout+0x11f (FPO: [7,7,0])
b7c1ea74 bf92b42e bbe83720 00000287 00000019 win32k!xxxSendMessage+0x1b (FPO: [4,0,0])
b7c1eaa4 bf92c675 e2d75870 e13f10e0 e13f10e0 win32k!xxxImmActivateLayout+0x5b (FPO: [2,3,4])
b7c1ec08 bf8696d9 00000004 00000000 e13f10e0 win32k!xxxImmActivateThreadsLayout+0x10c (FPO: [3,82,4])
b7c1ec48 bf86862f e13f10e0 00000100 bbe88228 win32k!xxxInternalActivateKeyboardLayout+0xb7 (FPO: [3,8,4])
b7c1ec70 bf80b5a7 8989a2d0 04090c09 00000100 win32k!xxxActivateKeyboardLayout+0x4c (FPO: [4,3,0])
b7c1ecd4 bf80ec9f bbe88228 00000050 00000001 win32k!xxxRealDefWindowProc+0x56d (FPO: [4,16,0])
b7c1ecec bf81c176 bbe88228 00000050 00000001 win32k!xxxWrapRealDefWindowProc+0x16 (FPO: [5,0,0])
b7c1ed08 bf80eee6 bbe88228 00000050 00000001 win32k!NtUserfnNCDESTROY+0x27 (FPO: [7,0,0])
b7c1ed40 8054168c 0006040c 00000050 00000001 win32k!NtUserMessageCall+0xae (FPO: [7,3,0])
b7c1ed40 7c90e514 0006040c 00000050 00000001 nt!KiFastCallEntry+0xfc (FPO: [0,0] TrapFrame @ b7c1ed64)
0012fadc 7e4194be 7e428e0d 0006040c 00000050 ntdll!KiFastSystemCallRet (FPO: [0,0,0])
0012fb30 7e43652f 0006040c 00000050 00000001 USER32!NtUserMessageCall+0xc
0012fb50 7e418734 0006040c 00000050 00000001 USER32!EditWndProcW+0x5d (FPO: [4,0,4])
0012fb7c 7e418816 7e4364cf 0006040c 00000050 USER32!InternalCallWinProc+0x28
0012fbe4 7e42a013 00000000 7e4364cf 0006040c USER32!UserCallWinProcCheckWow+0x150 (FPO: [Non-Fpo])
0012fc14 7e42a039 7e4364cf 0006040c 00000050 USER32!CallWindowProcAorW+0x98 (FPO: [6,0,0])
0012fc34 004c0e7d 7e4364cf 0006040c 00000050 USER32!CallWindowProcW+0x1b (FPO: [5,0,0])
0012fda4 004c0d80 02940d99 02bb9430 fffffffe audit4_home!Vcl.Controls.TWinControl.DefaultHandler+0xdd
0012fdf0 004c03d3 0012fe04 004c03eb 0012fe20 audit4_home!Vcl.Controls.TWinControl.WndProc+0x5b8
0012fe20 00467b3e 00000050 00000001 04090c09 audit4_home!Vcl.Controls.TWinControl.MainWndProc+0x2f
0012fe38 7e418734 0006040c 00000050 00000001 audit4_home!System.ClassesStdWndProc+0x16
0012fe64 7e418816 02940d99 0006040c 00000050 USER32!InternalCallWinProc+0x28
0012fecc 7e4189cd 00000000 02940d99 0006040c USER32!UserCallWinProcCheckWow+0x150 (FPO: [Non-Fpo])
0012ff2c 7e418a10 0012ff60 00000000 0006040c USER32!DispatchMessageWorker+0x306 (FPO: [Non-Fpo])
0012ff3c 005a6980 0012ff60 00120100 0012ff9c USER32!DispatchMessageW+0xf (FPO: [1,0,0])
0012ff4c 7c910222 0000000f 028458e0 005a69c3 audit4_home!Vcl.Forms.TApplication.ProcessMessage+0xf8
0012ff9c 016e6903 0012ffb0 016e6916 0012ffc0 ntdll!RtlpAllocateFromHeapLookaside+0x42 (FPO: [Non-Fpo])
0012ffc0 7c817077 7c910222 0000000f 7ffde000 audit4_home!Audit4_home.initialization+0xc3
0012fff0 00000000 016e6840 00000000 00000018 kernel32!BaseProcessStart+0x23 (FPO: [Non-Fpo])

Highlighted above there is a call to win32k!xxxSendMessage. This is passed the address of a window at bbe83720. And when we look at that, we see the following:

lkd> dd bbe83720
bbe83720  000d01fe 00000006 e2d75870 8998ae50

And window 000d01fe is the IME window for the other, deadlocked thread! Why is the kernel sending a message to another thread here and now? And without a timeout? That’s our deadlock. We know how it happens, but I at least still don’t know why!

Still, that’s a step forward. Next was to figure out why we were getting that. After a fair bit of exploration, I looked at little harder at the call stack, and decided to investigate the parameters for xxxActivateKeyboardLayout:

8989a2d0 04090c09 00000100 win32k!xxxActivateKeyboardLayout+0x4c

That third parameter in xxxActivateKeyboardLayout corresponds to the Flags parameter for ActivateKeyboardLayout. Perusing the documentation for the Windows ActivateKeyboardLayout function, we can see that 0x100 is KLF_SETFORPROCESS. Bingo! That sounds pretty suspicious! I worked through the assembly code for xxxActivateKeyboardLayout and xxxRealDefWindowProc, and sure enough, that was it: when xxxRealDefWindowProc processes WM_INPUTLANGCHANGEREQUEST, it sets the Flags parameter to 0x0100:

win32k!xxxRealDefWindowProc+0x44d:
8211f1dd 56              push    esi
8211f1de 6800010000      push    100h
8211f1e3 ff7514          push    dword ptr [ebp+14h]
8211f1e6 6a00            push    0
8211f1e8 e8dfe40100      call    win32k!_GetProcessWindowStation (8213d6cc)
8211f1ed 50              push    eax
8211f1ee e838bdf9ff      call    win32k!xxxActivateKeyboardLayout (820baf2b)

In Windows Vista and later versions, this problem does not arise as frequently, because the Text Services Framework takes over the Alt+Left Shift command (before DefWindowProc gets a look-in) and calls ActivateKeyboardLayout without the KLF_SETFORPROCESS flag set. However, the issue can still arise if you ever use that flag in your own code when calling ActivateKeyboardLayout.

To summarize, it’s helpful to remember that technically, we were using the functions as designed. The primary issue is combination of design flaws, first, with how that KLF_SETFORPROCESS flag is handled: either the kernel code should be using a timeout when it sends the message to the other thread, or it should be queuing an event for the other thread to handle when it gets around to it; and second, with TThread.Synchronize, which unfortunately by design cannot be robust against deadlocks.

In any case, to make your own code more robust:

Determine, when using synchronisation calls, if there is any chance that windows could be created by your thread, and if so, use MsgWaitForMultipleObjects and a message loop instead of WaitForSingleObject or WaitForMultipleObjects. Check the libraries you are using, and if you are using COM, it’s safest to assume that windows will be created.
Don’t use the TThread.Synchronize procedure. Recent versions of Delphi include TThread.Queue, which is asynchronous, and so avoids this deadlock.
Think carefully about whether a thread is the right solution to the problem.
Don’t use the KLF_SETFORPROCESS flag!

Locating Delphi exceptions in a live session or dump using WinDbg

August 29, 2012Computing, Delphi, Development, ToolsMarc Durdin

The offsets used in this blog are correct for Delphi XE2, and this information is only valid for x86. You will have to plug in other values for other versions of Delphi. You can find more details in my earlier Delphi WinDbg blog articles:

The following WinDbg command will return a list of all Delphi exception records located within the stacks of each thread in the process. Delphi uses the exception code 0EEDFADE:

~*e s -d poi(@$teb+8) poi(@$teb+4) 0EEDFADE

If you just wanted to do the current thread, you would run:

s -d poi(@$teb+8) poi(@$teb+4) 0EEDFADE

What is teb? It’s the Thread Environment Block. The data at teb+8 and teb+4 are the current bottom of the stack and the top of the stack, respectively.

For example, when looking at a crash dump we received, we were able to spot exceptions in two different threads:

0:000&> ~*e s -d poi(@$teb+8) poi(@$teb+4) 0EEDFADE
0012e9ec  0eedfade 00000000 00000001 00000000  ................
0012ef6c  0eedfade 00000003 00000000 7586d36f  ............o..u
0012f3d4  0eedfade 00000001 00000000 7586d36f  ............o..u
0012f42c  0eedfade 00000001 00000007 0012f43c  ............<...
0012f6ec  0eedfade 00000003 00000000 7586d36f  ............o..u
0012fb70  0eedfade 00000001 00000000 7586d36f  ............o..u
0012fbc8  0eedfade 00000001 00000007 0012fbd8  ................
04a6f06c  0eedfade 00000003 00000000 7586d36f  ............o..u
04a6f4e8  0eedfade 00000001 00000000 7586d36f  ............o..u
04a6f540  0eedfade 00000001 00000007 04a6f550  ............P...
04a6f72c  0eedfade 00000003 00000000 7586d36f  ............o..u
04a6fb94  0eedfade 00000001 00000000 7586d36f  ............o..u
04a6fbec  0eedfade 00000001 00000007 04a6fbfc  ................

This has returned exception records in two different thread stacks (0012* and 04a6*). We can see a number of potential exception records; some of these are not really records (because the 0EEDFADE value is not only used in the EXCEPTION_RECORD structure; it is also passed as a parameter to the RaiseException function among others). However, if the 3rd DWORD shown is 0, then this is probably a real exception record, and not part of a function call. Why this? Because EXCEPTION_RECORD's third member is a pointed to a nested exception record, which it seems is always set to NULL in Delphi.

To examine the exception record run the following command:

0:000>.exr 0012ef6c
ExceptionAddress: 7586d36f (KERNELBASE!RaiseException+0x00000058)
   ExceptionCode: 0eedfade
  ExceptionFlags: 00000003
NumberParameters: 7
   Parameter[0]: 005f2d4c
   Parameter[1]: 08c92148
   Parameter[2]: 800a0e7f
   Parameter[3]: 005f2d4c
   Parameter[4]: 0299ef64
   Parameter[5]: 0012f48c
   Parameter[6]: 0012f458

To confirm that this is a real Delphi exception check two things:

The ExceptionAddress should point to an address within the RaiseException function (that actual address may vary between versions of Windows).
It should have 7 parameters:

0: code address where the exception was raised
1: address of the Exception object
2-6: additional data relating to the exception type and stored registers

Let's examine the Exception object:

0:000> dd 08c92148
08c92148  004c2134 08cbb20c 0012ee54 800a0e7f
08c92158  08cf3f94 080cb690 0000002a 0054e1a4
08c92168  08c99fec 0000005e 08134330 00000000
08c92178  00000000 00000024 00000032 00000100
08c92188  0000001a 00000003 0000000b 4473624f
08c92198  54657461 00656d69 0000002a 0054e1a4
08c921a8  08c99fec 0000005d 08134314 00000000
08c921b8  00000000 00000023 00000032 00000100

From here, we can use the same spelunking techniques as in my previous WinDbg articles:

0:000> da poi(poi(8c92148)-38)+1
004c214f  "EOleException.!L"
0:000> du poi(8c92148+4)
08cbb20c  "Operation cannot be performed wh"
08cbb22c  "ile executing asynchronously"

You can also skip examining the exception record if you want, with shortcuts such as:

da poi(poi(poi(0012ef6c+18))-38)+1; du poi(poi(0012ef6c+18)+4)

How about working with nested exceptions? Take the following scratch program:

unit NestedExceptions;

interface

uses
  Winapi.Windows, Winapi.Messages, System.SysUtils, System.Variants, System.Classes, Vcl.Graphics,
  Vcl.Controls, Vcl.Forms, Vcl.Dialogs, Vcl.StdCtrls;

type
  TForm1 = class(TForm)
    Button1: TButton;
    procedure Button1Click(Sender: TObject);
  private
    { Private declarations }
  public
    { Public declarations }
  end;

var
  Form1: TForm1;

implementation

{$R *.dfm}

type
  EWhatAMess = class(Exception);

  EAnotherError = class(Exception);

  ESomeError = class(Exception)
  private
    FExtraData: string;
  public
    constructor Create(const Message, ExtraData: string);
  end;

procedure HandleThisOneToo;
begin
  raise EWhatAMess.Create('What a mess');
end;

procedure HandleIt;
begin
  try
    raise EAnotherError.Create('Another Error Message');
  except
    on E:EAnotherError do
      HandleThisOneToo;
  end;
end;

procedure TForm1.Button1Click(Sender: TObject);
begin
  try
    raise ESomeError.Create('Some Error Message', 'Here''s some extra data');
  except
    on E:ESomeError do
      HandleIt;
  end;
end;

{ ESomeError }

constructor ESomeError.Create(const Message, ExtraData: string);
begin
  FExtraData := ExtraData;
  inherited Create(Message);
end;

end.

Build this program, then load it up in WinDbg. You'll need to enable the event filter for 0EEDFADE as per my previous blog. Click the bad, bad button and watch as the exceptions are thrown. For the first two exceptions, just g. On the third exception, we'll spelunk with the search technique.

0:000> ~*e s -d poi(@$teb+8) poi(@$teb+4) 0EEDFADE
0018e814  0eedfade 00000001 00000000 754ab9bc  ..............Ju
0018e86c  0eedfade 00000001 00000007 0018e87c  ............|...
0018e9ac  0eedfade 00000003 00000000 754ab9bc  ..............Ju
0018ee60  0eedfade 00000001 00000000 754ab9bc  ..............Ju
0018eeb8  0eedfade 00000001 00000007 0018eec8  ................
0018f014  0eedfade 00000003 00000000 754ab9bc  ..............Ju
0018f4c8  0eedfade 00000001 00000000 754ab9bc  ..............Ju
0018f520  0eedfade 00000001 00000007 0018f530  ............0...

The first instance of an exception in the stack will have the Exception Flag 00000001. This is the one we are interested in, in each case. Let's look at them:

0:000> da poi(poi(poi(0018e814+18))-38)+1; du poi(poi(0018e814+18)+4)
0051138f  "EWhatAMess"
02548654  "What a mess"
0:000> da poi(poi(poi(0018ee60+18))-38)+1; du poi(poi(0018ee60+18)+4)
00511437  "EAnotherErrorH.Q"
0256c36c  "Another Error Message"
0:000> da poi(poi(poi(0018f4c8+18))-38)+1; du poi(poi(0018f4c8+18)+4)
00511523  "ESomeErrorJ"
02581d3c  "Some Error Message"

You may find that some exception records are no longer valid as they can be overwritten over time. This happens for nested exceptions, unfortunately, if you don't actually break on the exception in WinDbg before it is handled in the application (which will typically be the case if you attach a debugger to a process with an exception dialog visible). In this situation, only the final exception record will point to a live Exception object. You may notice that innermost exception had a little bit of extra data. How do we pull that out? Let's look at the ESomeError object in memory:

0:000> dd poi(18f4c8+18)
02548698  005114d4 02581d3c 00000000 00000000
025486a8  00000000 00000000 0256c32c 00000000
025486b8  00000073 00000000 00000000 00000000
025486c8  00000000 00000000 00000000 00000000
025486d8  00000000 00000000 00000000 00000000
025486e8  00000000 00000000 00000000 00000000
025486f8  00000000 00000000 00000000 00000000
02548708  00000000 00000000 00000000 00000000

What have we got here? Breaking that data down, we have:

02548698 005114d4 Pointer to class
+0000004 02581d3c Pointer to Exception.FMessage Unicode string
+0000008 00000000 Exception.FHelpContext
+000000C 00000000 Exception.FInnerException
+0000010 00000000 Exception.FStackInfo
+0000014 00000000 Exception.FAcquireInnerException

That InnerException data would be useful — but it is not commonly used, yet. And then we have the data for ESomeError:

+0000018 0256c32c ESomeError.FExtraData

Examining that:

0:000> du 256c32c
0256c32c  "Here's some extra data"

And there you have it. Knock yourself out!

Mixing RTL and LTR: Plaintext vs HTML

August 29, 2012Bugs, Computing, Unicode, WindowsMarc Durdin

Just a very short post today. We still have some way to go in mixing RTL and LTR text. For example, the following image, snipped from Outlook 2010, shows the issue:

The subject and body both say the same thing, but the display order is different. Do you know why? It’s because the subject is plain text and is assuming that the text is primarily right-to-left, whereas the body is HTML, and is assuming that the text is primarily left-to-right.

Note how the full stop in the subject appears to the left of the English text. This is because the display renderer has assumed that the whole run of text is right-to-left, so punctuation is treated as right-to-left, and so displays after (in a right-to-left sense) the text.

The question is, of course, how do you determine directionality given an arbitrary plain text string? It’s not really possible to do so reliably in the absence of other metadata. The W3C article on directionality is helpful here: http://www.w3.org/TR/html4/struct/dirlang.html

Another view of the message:

Interestingly, Outlook Web Access does not do this, because its UI takes its directionality from the base HTML document:

When characters go astray: diagnosing missing characters when printing with IE9

August 22, 2012Bugs, Computing, Delphi, Development, WindowsMarc Durdin

I was recently landed with a real doozy of a support case: on some systems, under some circumstances, print jobs would be missing random characters. Like the following image:

As opposed to how it should appear:

This example is missing the following letters: Å ê Ö q Q

In this particular example, you may be misled into thinking the problem is just with non-ASCII characters. But note that both upper case Q and lower case q are missing as well. Other example documents that we collected had no non-ASCII characters in them at all. I just used this example because it was one of the few we captured with fake data as opposed to real customer data.

This problem was not particularly printer dependent, and could be replicated (thank goodness!) with a PDF print driver. But the circumstances behind replicating the problem were a little strange: it only occurred, in the second document printed, when more than one print job was sent to the printer in quick succession. Some combinations of documents did not reflect the issue. If we sent the documents in the opposite order, they would often succeed. We were unable to replicate the problem on the printer in our office.

After many tries, I finally managed to replicate the issue on my development machine. Now I could break out the debugging tools and start some tracing.

I first used Process Monitor to capture details about temporary files in the print process. A lot of temporary files were generated. I examined many of them, but the files appeared intact: the problem was not reflected at this point in the print job.

Next I attached a debugger to the process to try and figure out if I could spot any issues during the print itself. I put a breakpoint on GDI32!StartDocW and when that fired, enabled a breakpoint on GDI32!ExtTextOutW. This showed me all the strings that were being sent to the page during the print job.

Unfortunately, Internet Explorer 9 uses a process of printing to XPS and then converting that output to GDI commands for printing. The library which does the conversion from XPS to GDI used the flag ETO_GLYPH_INDEX when calling ExtTextOutW. This meant that examining the data in the debugger was pretty much a non-starter, without any easy to spot strings. So eventually I decided to log the whole lot to a text file, and pull the TrueType font glyph indices in the hope that the information in the TTF file would be sufficient to reconstruct the original text strings.

As it turned out, I was trivially able to match that data up to the XPS data by generating the report as an XPS. These are just a ZIP file with fonts, images and XML rendering data embedded. Easy to spelunk. It certainly seems that the correct glyph indices were being sent to the print engine. I matched these up between the data captured in the debugger as follows.

The text string:

Dr Åbrique T Ömbê

The XPS element:

Pulling out just the Glyph indices from the XPS element:

39;85;3;99;69;85;76;84;88;72;3;55;3;103;80;69;114

Converting those glyph indices to hexadecimal:

0027 0055 0003 0063 0045 0055 004c 0054
0058 0048 0003 0037 0003 0067 0050 0045
0072

Matching the data captured from ExtTextOutW:

233a4fe0  00550027 00630003 00550045 0054004c
233a4ff0  00480058 00370003 00670003 00450050
233a5000  abab0072

Note that the DWORDs are reversed because Intel uses little endian byte order (and abab is just garbage past the end of the array).

This suggested that the font data itself is being corrupted. Next, to check the font data in the .ps file as it is generated! Helpfully, PrimoPDF stores a .ps file in a temporary folder during the conversion to PDF. So it was just a matter of grabbing that file and looking at that. Well, of course PostScript and RTF are near the top of the list of painfully unreadable file formats, but a little trial and error (and GhostScript) found me the following:

779 1636 M [66 31 23 66 46 30 26 46 45 41 23 56 23 66 71 46  0]xS

Of which, 080B03380E0B1B3905090322033A0D0E3B should translate to Dr Åbrique T Ömbê:

08 0B 03 38 0E 0B 1B 39 05 09 03 22 03 3A 0D 0E 3B
D  r     Å  b  r  i  q  u  e     T     Ö  m  b  ê

Interestingly, the missing glyphs are all in the range 38-3B in this example. This line is identical in a version of the file that printed without errors, but the font data is quite different.

Right, so we know data corruption is occurring in the font data. But we knew that data corruption was the issue already. So where are we now? Have I accomplished anything by going down this track? Yes! I know now that it is happening in the conversion of an XPS document to GDI, in the print process. Given this and that it happens only when 2 documents are printed, it is almost certainly a threading issue in the XPS to GDI library.

Armed with that new-found knowledge, I set out to build a minimal test application. Here’s the very hacky test case I ended up with:

#include "stdafx.h"

int Print(LPWSTR inFile, LPWSTR outFile, LPWSTR printer);

int _tmain(int argc, _TCHAR* argv[]) {
  if(argc < 2) {
    printf("Usage: XpsThread Apartment|Multi [PathToXPSFiles] [Printer]\n");
    return 1;
  }

  if(_wcsicmp(argv[1], L"Apartment") == 0)
    CoInitializeEx(0, COINIT_APARTMENTTHREADED);
  else
    CoInitializeEx(0, COINIT_MULTITHREADED);

  /* This assumes 2 files named 0.xps and 1.xps.  No need to make more complex at this stage. */

  WCHAR inFile[MAX_PATH], outFile[MAX_PATH];

  PWCHAR printer = argc > 3 ? argv[3] : L"PrimoPDF";

  for(int i = 0; i < 2; i++)
  {
    wsprintf(inFile, L"%s%s%d.xps", argc > 2 ? argv[2] : L"", argc > 2 ? (*(wcschr(argv[2],0)-1) == '\\' ? L"" : L"\\") : L"", i);
    wsprintf(outFile, L"%s%s%d.ps", argc > 2 ? argv[2] : L"", argc > 2 ? (*(wcschr(argv[2],0)-1) == '\\' ? L"" : L"\\") : L"", i);

    Print(inFile, outFile, printer);
  }

  getchar();  // Manually wait for print jobs to finish.

  CoUninitialize();
  // Examine the .ps files in GSview or whatever takes your fancy.  The second file generated will often be corrupt if COINIT_MULTITHREADED is used.
  return 0;
}

int Print(LPWSTR inFile, LPWSTR outFile, LPWSTR printer) {
  IXpsPrintJobStream *docStream = NULL;
  HRESULT hr;

  hr = StartXpsPrintJob(printer, NULL, outFile, 0, 0, NULL, 0, NULL, &docStream, NULL);
  if(SUCCEEDED(hr)) {
    FILE *fp = _wfopen(inFile, L"rb");
    BYTE buf[512];
    int n = fread(buf, 1, 512, fp);
    ULONG sz;
    while(n > 0) {
      docStream->Write(buf, n, &sz);
      n = fread(buf, 1, 512, fp);
    }
    fclose(fp);

    hr = docStream->Close();
    if(SUCCEEDED(hr)) {
      wprintf(L"Succeeded: %s\n", inFile);
      return 0;
    }
  }

  wprintf(L"Failed to print %s: %x\n", inFile, hr);
  return 1;
}

It may have been a little hacky, but it proved the bug. It turns out, printing multiple documents at once with COM initialised with the flag COINIT_MULTITHREADED will consistently fail, even though that is the mode used in the example in MSDN.

Okay, so let’s take that information back to our application in Windbg, and see if we can find anything in MSHTML’s printing. I popped in a couple of breakpoints:

bp coinitializeex ".echo ---CoInitializeEx---; ~.; dd esp+8 L1; gc"
bp gdi32!startdocw ".echo ---StartDocW---; ~.;"

The first would show me the thread ID and the COINIT flags issued every time CoInitializeEx was called. The second would break when StartDoc was called to start a print job, from the XPS printing code (XpsGdiConverter.dll). This would tell me the threading mode for the thread. And as I was now expecting, MSHTML uses COINIT_MULTITHREADED (= 00000000).

---CoInitializeEx---
.  2  Id: 1464.f80 Suspend: 1 Teb: 7efd7000 Unfrozen
Start: mshtml!ShowModelessHTMLDialog+0x5460 (6dc0d50e)
Priority: 0  Priority class: 32  Affinity: f
189ef0b4  00000000
---StartDocW---
.  2  Id: 1464.f80 Suspend: 1 Teb: 7efd7000 Unfrozen
Start: mshtml!ShowModelessHTMLDialog+0x5460 (6dc0d50e)
Priority: 0  Priority class: 32  Affinity: f

Interestingly, when I was researching XPS printing in MSDN, I came across a page in the .NET Framework documentation which contains the following detail:

The three-parameter AddJob(String, String, Boolean) overload of AddJob must run in a single thread apartment whenever the Boolean parameter is false, which it must be when a non-XPSDrv printer is being used. However, the default apartment state for Microsoft .NET is multiple thread. This default must be reversed since the example assumes a non-XPSDrv printer

There are two ways to change the default. One way is to simply add the STAThreadAttribute (that is, “[System.STAThreadAttribute()]”) just above the first line of the application’s Main method (usually “static void Main(string[] args)”). However, many applications require that the Main method have a multi-threaded apartment state, so there is a second method: put the call to AddJob(String, String, Boolean) in a separate thread whose apartment state is set to STA with SetApartmentState. The example below uses this second technique.

This little detail is not in the Win32 XPS documentation. I can’t prove that this is the cause of the problem but it certainly seems suspicious that this goes wrong with MSHTML. I tried a very naughty test whereby I overrode the COINIT flags whenever CoInitializeEx was called (WinDbg: bp ole32!coinitializeex “ed esp+8 2; gc”), and the problem “went away” … it’s a nice finger-in-the-wind test but certainly not conclusive!

The problem does not occur on the printer in our office, because it is an XPSDrv printer. The problem only occurs with drivers that do not support XPS, because XPSDrv does not require apartment threading.

Anyway, I think it’s time to take this case to Microsoft. In the meantime, the safest workaround is to wait for the first print job to finish before starting a subsequent one. We’ll probably put that fix in for now, even though it makes the print tediously slow!

Second Rant: Why oh why does Adobe Reader need to restart?

August 20, 2012Bugs, Computing, Suggestions, WindowsMarc Durdin

Updated: 3:50pm — It gets worse — see below!

Have you ever restarted Windows (e.g. after installing Windows updates) and then been presented with the following dialog:

And of course, once you click that Install button, you eventually end up with the following:

(For the pedantic reader, yes, I captured different versions of Adobe Reader Updater on different machines. I’ve had to deal with this on 3 machines so far today. Get over it!)

Here’s why Adobe Reader needs to restart, even if you have nothing apparently running: the installer does not shut down the background processes AdobeARM.exe and reader_sl.exe (and possibly others).

For example on my Windows 7 x64 test machine, in Event Viewer, you can see the following events.

Product: Adobe Reader 9.5.2. The file C:\Program Files (x86)\Adobe\Reader 9.0\Reader\reader_sl.exe is being used by the following process: Name: reader_sl , Id 3084.

Product: Adobe Reader 9.5.2. The file C:\Program Files (x86)\Common Files\Adobe\ARM\1.0\AdobeARM.exe is being used by the following process: Name: AdobeARM , Id 3096.

Both these processes are part of Adobe Reader, and they should be shut down by the installer. But the updater does not do this. So, you are forced to restart. This happens even if you don’t open your web browser or any other applications!

So before clicking that fateful Install button, go and kill those Adobe processes hanging around so you can avoid that restart. Also, you will need to close your web browser(s), Office applications, and anything else that you can think of that may also be involved. It’s a pain, but it’s still better than restarting.

Update: Killing AdobeARM.exe at the wrong time is a bit of an issue. Because AdobeARM.exe presents the front end to the updater! This means that if you kill AdobeARM.exe, the update dialog also disappears. The tricky fix then, is to kill AdobeARM.exe just after clicking the Install button. You get no feedback on the install, but you can see msiexec.exe (up to 3 instances during the install) running in Task Manager or Process Explorer. When it finishes, two of those will close down (the last instance hangs around for a while). It will usually leave a happy message in the Event Log.

A little extra rant? How poor is that design? Pushing updates for the updater in such a way that it forces a restart… A little bit of forethought could have avoided that one, Adobe.

Rant of the day: The Big Green Download Button

August 20, 2012Computing, Malware, Pontificating, SuggestionsMarc Durdin

What is it with software download sites and their download links? Typically if you visit a website to download software, especially free or open source software, you’ll be presented with a page with several big green download buttons. Often the actual download will be a link buried to one side or somewhere halfway down the page. Nearly every site offering free software does this. And it sucks.

You may argue that it is the ad network providing these ads, but the developer of the site has chosen to put ads on their page. They know that the ads are misleading. And yet they persist.

Do you want to see some examples?

The actual download link is below the fold for most users. A subtle ‘Advertisement’ box which most users will simply miss.

Paint.net does it twice: on their download page, and then on the download host page.

Spot the real download link: it isn’t a big friendly green button, is it?

“Reputable” sites such as Sourceforge also do it. A little link for the real download. And big green Download buttons for the fake ones.

Four fake downloads at FileHippo. Can you see them all?

CNet manage to have two fake downlaods. Amazingly, slightly less offensive than most of the others, apart from the irritating animated ad.

Brothersoft is particularly bad. The actual download link is further down the page. I count ten download links above the fold.

Developers, leaving ads like this on your software download pages hurts your users. Using file distribution sites such as the ones illustrated above, that bury your software download link among all their ads, hurts your users.

Inexperienced users will download the wrong program and end up frustrated and angry that you tricked them into installing rubbish they didn’t want.

More experienced users will usually see past the ads but it will still be a frustrating experience for them, and gosh does it look like seedy hour. It feels like visiting a warez site. Your reputation is tarnished. It’s nearly as bad as bundling toolbars (and we all know how that makes you look!) You look cheap. And nasty.

Gurus use Adblock. They won’t see your ads. You won’t get revenue from them. And over time, they are teaching the inexperienced users to install Adblock so that the less experienced users get the software they want, as well.

Note how Adblock do not have any other green download buttons but their own. Yeah, the page is ugly, but at least they got their download button right!

Get rid of those low-life ads! Build a reputation of delivering your software yourself, from your site, without misleading advertisements. That’s my rant done.

Marc Durdin's Blog

Thoughts on computing, Bible, cycling, family, my stories and more