Category Archives: Development

The case of the hidden exception

I was building a new installer for an application today using WiX v3, and ran into a problem when trying to extract the COM SelfReg from a DLL using heat.

The error that heat returned was not very helpful.

heat.exe : warning HEAT5150 : Could not harvest data from a file that was expected to be a SelfReg DLL: PrinterBvr.dll. If this file does not support SelfReg you can ignore this warning. Otherwise, this error detail may be helpful to diagnose the failure: Exception has been thrown by the target of an invocation..

No tracing options were available in heat to figure out what actually caused the exception or even what the exception was! Time to pull out one of my favourite debugging tools: Procmon. Procmon traces activities on your system and records registry, file, network and similar activities to a log file.

So, I ran Procmon and added a filter to restrict the log to processes named “heat.exe”:

Then I started the trace and re-ran my command. It failed (as expected) with the same error. I came back to Procmon, stopped the trace, and scanned through looking for a failure point. Near the bottom of the trace, I found the issue. Part of the work heat does is to redirect registry actions to a special location in the registry so it can capture the work the DLL’s DllRegisterServer call does. Unfortunately, printerbvr.dll was depending on a specific key in HKLM already existing, which of course it did not in the redirected registry keys.

At this point I was able to understand the cause of the problem. But how could I work around this to get heat to do its work? I only needed to do this once to capture the registry settings. I could have used Procmon to watch all the activity and manually built the .wxs source file from that. But I was loath to do so much work.

Looking at the trace a bit closer I noted that the key that heat used as a temporary root for its registry redirection was based on its process ID. So I couldn’t create the keys beforehand because I wouldn’t know what the process ID was until after heat had already started — and the process only took a fraction of a second to run.

Windbg to the rescue! I fired up windbg, and started a new debug session with heat.exe and the appropriate command line parameters for heat. I didn’t want to create the registry key based on the PID before heat did that itself because I figured heat might complain if the registry key was already there. So I quickly peeked at the handy stack trace that Procmon had captured for the event in question:

From that I discerned that I could add a breakpoint when PrinterBvr.dll + 0x18BB (subtracting 6 bytes for the size of the ‘call’ opcode), and continued execution. I could probably have added the breakpoint at printerbvr.dll!DllRegisterServer, but this worked for me:

bu printerbvr.dll+0x18bb
g

At the right time, we hit the breakpoint and I then jumped into the registry (after quickly looking up the PID for this instance of heat.exe) and created my new registry key:

This time, everything completed happily, I had my .wxs output file, and I was able to get on with some real work (i.e. writing this blog post).

Spy++ in the case of the missing message

Yesterday, I wrote about WM_QUERYENDSESSION and the app that would not shut down. In that blog, I found a thread that never checked its message queue, and this meant that Windows thought the app was not responding.

So why didn’t Spy++ show the sent message in its log when it was sent? Spy++ uses a SetWindowsHookEx(WH_CALLWNDPROC) hook to log messages sent to windows. It turns out that in Windows NT 4.0 and later versions, the WH_CALLWNDPROC hook is called just before the message is sent to the window procedure. So our message never made it to the hook.

However, in Win9x and NT 3.51, the hook was called when the SendMessage function was called. Spy++ has its heritage in that era. Perhaps now the message delivery title in Spy++ should be changed from “sent” to “received” in order to clarify this difference.

A debug tool wish list item: MsgMon

After running through a painful message debugging session today, I realised again that at the top of my debug tool wishlist tool is MsgMon, a “Procmon for messages”. Spy++ and Winspector Spy are just not up to the task. Once you start using Procmon, with its caching of call stacks, filtering, phenomenal performance and more, life seems very hard each time you start Spy++.

I would love to write this tool, if only I had the time… It would be nice to be able to use the existing Procmon framework as it is so solid…

Debugging an application that would not behave with WM_QUERYENDSESSION

I was recently tasked with addressing a problem with an application that would not shut down when given a WM_QUERYENDSESSION message. On the surface, this seemed easy enough to fix — just make sure that the WM_QUERYENDSESSION message was appropriately handled by all the top level windows in the application.

Step 1 was to add a handler into the main top level window that avoided the normal shutdown dialog boxes that warned of backup and similar tasks. Where possible, Microsoft advises, you should not block the shutdown of Windows with dialog boxes, etc.

That done, I tested and found that the application was still not shutting down when it was supposed to. So I fired up Spy++ and got started on watching the WM_QUERYENDSESSION chatter in the application. Here I ran into my first little catch-22. I decided that I wanted to do a real shutdown, not a simulated one, to make sure that I was testing the correct scenario.

Problem was, Spy++ was always being shut down before I could capture the messages needed in my target application. Eventually I realised that I needed to start Spy++ after my application, because Windows would ask each application in turn to WM_QUERYENDSESSION, and as we know, my application would cancel the process because it wasn’t accepting a shutdown! Then I realised I could also make sure the shutdown was cancelled even when my application behaved properly, simply by loading Notepad after my application and typing a character or two into it. Then during the shutdown sequence, Iwould be asked if I want to save changes to the text document, and I could just click Cancel. Crude but effective.

With that out of the way, I ran through a Spy++ session and was able to spot an offending window that returned 0 to WM_QUERYENDSESSION. A little tracing and I found a utility window procedure that failed to call DefWindowProc. Naughty. Easily solved.

Well so I thought. Started testing again. The application was still failing to shutdown. I ran through Spy++, traced all the WM_QUERYENDSESSION messages for the process, and every single window procedure responded to the message, virtually instantly, with a nice clean 1, indicating that the window was happy to end the session. After 5 seconds, Windows would show its “Application is not responding” message, indicating that it had not received a response from one of the top level windows.

This was rather puzzling. I threw in a bit of logging into a dll using SetWindowsHookEx for WH_CALLWNDPROC and WH_CALLWNDPROCRET, just in case I was missing something with Spy++. No difference, except that it was somewhat easier to read.

So then I got thinking. This application runs quite a few background threads for loading data from a database. What if, I wondered, one of those threads had a window created but was not running a message loop? It sounded logical but I wasn’t very confident that that was the cause. Problem was, how to tell? I looked for ways to check the queue status for a thread in Windbg but could not find an easy solution. Eventually I decided to add some debugging into the common ThreadProc wrapper for the application, called GetQueueStatus(QS_SENDMESSAGE), and logged the response. Re-ran the application, and bingo, there was a thread closing down with a message in its queue. No prizes if you guess what that message was!

After that, it was simple. Reviewing the thread’s main function I quickly discovered a call to WaitForSingleObject. The thread did not appear to create any windows in any of our code, but it did use COM in its database connectivity, which meant that a helper window was created behind the scenes. I changed the WaitForSingleObject to a MsgWaitForMultipleObjects, and processed any outstanding messages when signalled with a PeekMessage loop.

And then it finally worked (and I forgot to start Notepad, so Windows restarted…) But it really stumped me for a bit today.

Feeling wasteful?

If you have the urge to be wasteful, here’s something fun you can do that may help.

Start Visual Studio’s “Create GUID” utility (Tools|Create GUID) and click New Guid a few hundred times.

As you wantonly create hordes of GUIDs that, unloved, instantly disappear into the ether, never to be seen again, you should take the opportunity to reflect on this task, being one of life’s more fruitless activities, and soon you will be feeling much better.

If this still doesn’t help, write a little program that calls CoCreateGuid, hundreds, millions, or even billions of times. You can even let the program run by itself, unmonitored and unchecked, gleefully consuming this precious resource, while you read War and Peace in its entirety. If that doesn’t fix it for you, nothing will.

PHP security updates are like malaria treatments

Applying PHP security updates is somewhat like taking a malaria treatment: they are, temporarily, worse than the disease itself. Let me explain.

Malaria is not a nice disease. I have had malaria a couple of times. We treated it with chloroquine (this was a few years ago). The treatment dose of chloroquine makes you feel worse than the malaria itself. But then you get better.

A PHP security hole is obviously a big issue for your average PHP site. I have had to apply patches to address these holes numerous times in the last few years. Unfortunately, it seems that each patch version for PHP introduces either new bugs or changes the published API. This causes all sorts of chaos and panic when the upgrade goes through, and lots of scrambling to fix a site that no longer works correctly. Sometimes you may not find the problem for several weeks in an infrequently used area of the site, so running a test server does not address this (besides, who wants to leave a known security hole online for several weeks?)

For example, PHP 5.2.7 was released to address a number of bugs and security holes but then was removed from distribution 3 days later because of an introduced bug changing the behaviour of magic quotes. That didn’t affect me because I did not use magic quotes… (Magic quotes were a majorly broken silly idea in the first place, but even worse is making it a configurable option so any code that I write has to test the setting… But let’s not get distracted.)

Or, to take an even more serious example, strtotime function return values changed in 5.1.0. As of 5.1.0, when strtotime is passed an invalid date, it returns FALSE instead of -1. This change was made without notice, and as far as I can tell, without any reference whatsoever in the huge changelog or even in bugs referenced in the changelog. That would have been better in the first place but this type of breaking change should never be made otherwise. I shouldn’t have to review all the changes to the PHP documentation, and then audit all 150,000+ lines of PHP code each time we update PHP!

That’s just two of the more obvious examples of the horrible PHP upgrade situation. Every time I have to upgrade, I just hold my breath and hope that no one has made any more silly breaking changes.