Category Archives: Windows

WinDBG and Delphi exceptions in x64

I recently was asked whether the Delphi exception event filter for WinDBG that I wrote about would also work with x64 Delphi applications.  The answer was no, it wouldn’t work, but that made me curious to find out what was different with x64.  I knew x64 exception handling was completely different to x86, being table based instead of stack based, but I wasn’t sure how much of this would be reflected in the event filter.

The original post contains the details about how the exception record was accessible at a known location on the stack, and how we could dig in from there.

Before firing up WinDBG, I had a look at System.pas, and found the x64 virtual method table offsets.  I have highlighted the key field we want to pull out:

{ Virtual method table entries }
{$IF defined(CPUX64)}
vmtSelfPtr           = -176;
vmtIntfTable         = -168;
vmtAutoTable         = -160;
vmtInitTable         = -152;
vmtTypeInfo          = -144;
vmtFieldTable        = -136;
vmtMethodTable       = -128;
vmtDynamicTable      = -120;
vmtClassName         = -112;
vmtInstanceSize      = -104;
vmtParent            = -96;
vmtEquals            = -88 deprecated 'Use VMTOFFSET in asm code';
vmtGetHashCode       = -80 deprecated 'Use VMTOFFSET in asm code';
vmtToString          = -72 deprecated 'Use VMTOFFSET in asm code';
vmtSafeCallException = -64 deprecated 'Use VMTOFFSET in asm code';
vmtAfterConstruction = -56 deprecated 'Use VMTOFFSET in asm code';
vmtBeforeDestruction = -48 deprecated 'Use VMTOFFSET in asm code';
vmtDispatch          = -40 deprecated 'Use VMTOFFSET in asm code';
vmtDefaultHandler    = -32 deprecated 'Use VMTOFFSET in asm code';
vmtNewInstance       = -24 deprecated 'Use VMTOFFSET in asm code';
vmtFreeInstance      = -16 deprecated 'Use VMTOFFSET in asm code';
vmtDestroy           =  -8 deprecated 'Use VMTOFFSET in asm code';

vmtQueryInterface    =  0 deprecated 'Use VMTOFFSET in asm code';
vmtAddRef            =  8 deprecated 'Use VMTOFFSET in asm code';
vmtRelease           = 16 deprecated 'Use VMTOFFSET in asm code';
vmtCreateObject      = 24 deprecated 'Use VMTOFFSET in asm code';
{$ELSE !CPUX64}
vmtSelfPtr           = -88;
vmtIntfTable         = -84;
vmtAutoTable         = -80;
vmtInitTable         = -76;
vmtTypeInfo          = -72;
vmtFieldTable        = -68;
vmtMethodTable       = -64;
vmtDynamicTable      = -60;
vmtClassName         = -56;
vmtInstanceSize      = -52;
vmtParent            = -48;
vmtEquals            = -44 deprecated 'Use VMTOFFSET in asm code';
vmtGetHashCode       = -40 deprecated 'Use VMTOFFSET in asm code';
vmtToString          = -36 deprecated 'Use VMTOFFSET in asm code';
vmtSafeCallException = -32 deprecated 'Use VMTOFFSET in asm code';
vmtAfterConstruction = -28 deprecated 'Use VMTOFFSET in asm code';
vmtBeforeDestruction = -24 deprecated 'Use VMTOFFSET in asm code';
vmtDispatch          = -20 deprecated 'Use VMTOFFSET in asm code';
vmtDefaultHandler    = -16 deprecated 'Use VMTOFFSET in asm code';
vmtNewInstance       = -12 deprecated 'Use VMTOFFSET in asm code';
vmtFreeInstance      = -8 deprecated 'Use VMTOFFSET in asm code';
vmtDestroy           = -4 deprecated 'Use VMTOFFSET in asm code';

vmtQueryInterface    = 0 deprecated 'Use VMTOFFSET in asm code';
vmtAddRef            = 4 deprecated 'Use VMTOFFSET in asm code';
vmtRelease           = 8 deprecated 'Use VMTOFFSET in asm code';
vmtCreateObject      = 12 deprecated 'Use VMTOFFSET in asm code';
{$IFEND !CPUX64}

I also noted that the exception code for Delphi x64 was the same as x86:

cDelphiException   = $0EEDFADE;

Given this, I put together a test x64 application in Delphi that would throw an exception, and loaded it into WinDBG.  I enabled the event filter for unknown exceptions, and triggered an exception in the test application.  This broke into WinDBG, where I was able to take a look at the raw stack:

(2ad4.2948): Unknown exception - code 0eedfade (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
KERNELBASE!RaiseException+0x39:
000007fe`fd6ccacd 4881c4c8000000  add     rsp,0C8h
0:000> dd rbp
00000000`0012eab0  00000008 00000000 00000021 00000000
00000000`0012eac0  0059e1f0 00000000 0059e1f0 00000000
00000000`0012ead0  0eedfade 00000001 00000000 00000000
00000000`0012eae0  0059e1dd 00000000 00000007 00000000
00000000`0012eaf0  0059e1dd 00000000 0256cff0 00000000
00000000`0012eb00  00000000 00000000 00000000 00000000
00000000`0012eb10  00000000 00000000 00000000 00000000
00000000`0012eb20  00000000 00000000 0256cff8 00000000

We can see at rbp+20 is the familiar looking 0EEDFADE value.  This is the start of the EXCEPTION_RECORD structure, which I’ve reproduced below from Delphi’s System.pas with a little annotation of my own:

  TExceptionRecord = record
    ExceptionCode: Cardinal;                 // +0
    ExceptionFlags: Cardinal;                // +4
    ExceptionRecord: PExceptionRecord;       // +8
    ExceptionAddress: Pointer;               // +10
    NumberParameters: Cardinal;              // +18
    case {IsOsException:} Boolean of
      True:  (ExceptionInformation : array [0..14] of NativeUInt);
      False: (ExceptAddr: Pointer;           // +20
              ExceptObject: Pointer);        // +28
  end;

We do have to watch out for member alignment with this structure — because it contains both 4 byte DWORDs and 8 byte pointers, there are 4 bytes of hidden padding after the NumberParameters member, as shown below (this is from MSDN, sorry to switch languages on you!):

typedef struct _EXCEPTION_RECORD64 {
  DWORD ExceptionCode;
  DWORD ExceptionFlags;
  DWORD64 ExceptionRecord;
  DWORD64 ExceptionAddress;
  DWORD NumberParameters;
  DWORD __unusedAlignment;
  DWORD64 ExceptionInformation[EXCEPTION_MAXIMUM_PARAMETERS];
} EXCEPTION_RECORD64, *PEXCEPTION_RECORD64;

But what we can see from TExceptionRecord is that at offset 0x28 in the record is a pointer to our ExceptObject.  Great!  That’s everything we need.  We can now put together our x64-modified event filter.

You may remember the x86 event filter:

sxe -c "da poi(poi(poi(ebp+1c))-38)+1 L16;du /c 100 poi(poi(ebp+1c)+4)" 0EEDFADE

So here is the x64 version:

sxe -c "da poi(poi(poi(rbp+48))-70)+1 L16;du /c 100 poi(poi(rbp+48)+8)" 0EEDFADE

And with this filter installed, here is how a Delphi exception is now displayed in WinDBG:

(2ad4.2948): Unknown exception - code 0eedfade (first chance)
00000000`0059e0cf  "MyException"
00000000`02573910  "My very own kind of error message"
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
KERNELBASE!RaiseException+0x39:
000007fe`fd6ccacd 4881c4c8000000  add     rsp,0C8h

I’ll dissect the pointer offsets a little more than I did in the previous blog, because they can be a bit confusing:

  • rbp+48 is a pointer to the exception object (usually a type that inherits from Exception).
  • poi(rbp+48) dereferences that, and at offset 0 right here, we have a pointer to the class type.
  • Before we look at the class type, poi(rbp+48)+8 is the first member of the object (don’t forget ancestor classes), which happens to be FMessage from the Exception class. That gives us our message.
  • Diving deeper, poi(poi(rbp+48)) is now looking at the class type.
  • And we know that the offset of vmtClassName is -112 (-0x70).  So poi(poi(poi(rbp+48))-70) gives us the the ShortString class name, of which the first byte is the length.
  • So we finish with poi(poi(poi(rbp+48))-70)+1, which lets us look at the string itself.

You will see that to access the exception message, I have opted to look directly at the Exception object rather than use the more direct pointer which is on the stack.  I did this to make it easier to see how it might be possible to pull out other members of descendent exception classes, such as ErrorCode from EOSError.

And one final note: looking back on that previous blog, I see that one thing I wrote was a little misleading: the string length of FMessage is indeed available at poi(poi(rbp+48)+8)-4, but the string is null-terminated, so we don’t need to use it — WinDBG understands null-terminated strings.  Where this is more of a problem is with the ShortString type, which is not null-terminated.  This is why sometimes exception class names displayed using this method will show a few garbage characters afterwards, because we don’t bother about accounting for that; the L16 parameter prevents us dumping memory until we reach a null byte.

Revisiting the VistaAltFixUnit.pas code

When Windows Vista was introduced, many moons ago now, many Delphi programs ran into a problem where buttons, checkboxes and the like would disappear when the user pressed the Alt key.  This was due to a change in how and when the underscores that displayed shortcuts on these buttons was displayed.

The problem was duly noted on Borland/CodeGear/Embarcadero’s QualityCentral forum, and several workarounds were documented.  The one that became most popular online seemed to be the one known as VistaAltFixUnit.pas.  Due to its popularity, we ended integrated that unit into our code, without spending a lot of time looking at it.  Surely many eyes had already reviewed it?

Now I don’t really want to be harsh but unfortunately there are a number of issues with the unit, ranging from performance to stability issues.  Here’s a rundown.

  1. The most egregious problem is the replacement of each TForm‘s WindowProc procedure.
    constructor TFormObj.Create(aForm: TForm; aRepaintAll: Boolean);
    begin
      inherited Create;
      Form := aForm;
      RepaintAll := aRepaintAll;
      Used := True;
      OrgProc := Form.WindowProc;
      Form.WindowProc := WndProc;
    end;
    
    procedure TFormObj.WndProc(var Message: TMessage);
    begin
      OrgProc(Message);
      if (Message.Msg = WM_UPDATEUISTATE) then
        NeedRepaint := True;
    end;

    Like any naively chained procedure hook, if another component or other code also attempts to replace the WindowProc, the two will end up interfering with each other at some point, as neither component will be aware of the other, and destruction of either component, or attempts to unhook, are likely to end in access violations and tears shortly thereafter.

    Now, we also make use of the TNT Unicode components (as we have not yet moved to a fully Unicode version of Delphi — that’s a big job!), and it turns out that these components also use this approach for some of their own jiggery-pokery.  The TNT Unicode components are more important to us than this one…  guess which component suite got to keep their hook?

  2. Note that, possibly due to instabilities that became evident, the TFormObj class does not attempt to restore the original WindowProc when it is destroyed.
  3. Performance-wise, this component does a lot of work checking form status at application idle time.  As other users noted, the TApplicationEvents approach only works if no other code in the project has already assigned TApplication.OnIdle.

    In the end, we rewrote the component from scratch, and used a Windows CallWndProc hook as this is clean, simple and robust.  We also opted to live with a very minor flicker when the user first presses Alt, as this reduced the complexity of the code significantly, relying on Windows’ existing repaint infrastructure instead of reimplementing it ourselves.

    Any thoughts on the changes?  What have we got wrong?  Feel free to use this code without restrictions, for what it is worth — at your own risk of course!

    unit VistaAltFixUnit2;
    
    interface
    
    uses
      Windows, Classes;
    
    type
      TVistaAltFix2 = class(TComponent)
      private
        FInstalled: Boolean;
        function VistaWithTheme: Boolean;
      public
        constructor Create(AOwner: TComponent); override;
        destructor Destroy; override;
      end;
    
    procedure Register;
    
    implementation
    
    uses
      Messages, Themes;
    
    procedure Register;
    begin
      RegisterComponents('MEP', [TVistaAltFix2]);
    end;
    
    var
      FInstallCount: Integer = 0;
      FCallWndProcHook: Cardinal = 0;
    
    { TVistaAltFix2 }
    
    function CallWndProcFunc(
      nCode: Integer;
      wParam: WPARAM;
      lParam: LPARAM): LRESULT; stdcall;
    var
      p: PCWPSTRUCT;
    begin
      if nCode = HC_ACTION then
      begin
        p := PCWPSTRUCT(lParam);
        if p.message = WM_UPDATEUISTATE then
        begin
          InvalidateRect(p.hwnd, nil, False);
        end;
      end;
      Result := CallNextHookEx(FCallWndProcHook, nCode, wParam, lParam);
    end;
    
    constructor TVistaAltFix2.Create(AOwner: TComponent);
    begin
      inherited;
      if VistaWithTheme and not (csDesigning in ComponentState) then
      begin
        Inc(FInstallCount); // Allow more than 1 instance, assume single threaded as VCL is not thread safe anyway
        if FInstallCount = 1 then
          FCallWndProcHook := SetWindowsHookEx(WH_CALLWNDPROC, CallWndProcFunc, 0, GetCurrentThreadID);
        FInstalled := True;
      end;
    end;
    
    destructor TVistaAltFix2.Destroy;
    begin
      if FInstalled then
      begin
        Dec(FInstallCount);
        if FInstallCount = 0 then
        begin
          UnhookWindowsHookEx(FCallWndProcHook);
          FCallWndProcHook := 0;
        end;
      end;
      inherited Destroy;
    end;
    
    function TVistaAltFix2.VistaWithTheme: Boolean;
    var
      OSVersionInfo: TOSVersionInfo;
    begin
      OSVersionInfo.dwOSVersionInfoSize := SizeOf(OSVersionInfo);
      if GetVersionEx(OSVersionInfo) and
         (OSVersionInfo.dwMajorVersion >= 6) and
         ThemeServices.ThemesEnabled then
        Result := True
      else
        Result := False;
    end;
    
    end.

Weird filename globbing in Windows

Can anyone explain this result?  (note: folder name has been obscured)

D:\...\exe\tds>dir *337*
 Volume in drive D has no label.
Volume Serial Number is 4279-DECE

Directory of D:\...\exe\tds

05/12/2011  02:39 PM        12,432,384 tds-8.0.324.0.exe
11/04/2011  02:41 PM        13,268,480 tds-8.0.337.0.exe
2 File(s)     25,700,864 bytes
0 Dir(s)  129,676,939,264 bytes free
D:\...\exe\tds>dir *337.0.exe
Volume in drive D has no label.
Volume Serial Number is 4279-DECE

Directory of D:\...\exe\tds

11/04/2011  02:41 PM        13,268,480 tds-8.0.337.0.exe
1 File(s)     13,268,480 bytes
0 Dir(s)  129,676,324,864 bytes free

I can’t figure out why tds-8.0.324.0.exe is matching *337*, in this folder.  There are lots of other files in the folder with similar names that don’t match.  chkdsk returned no errors.  Any ideas?

Update 4:45pm: I did a little more investigation.  The error can be reproduced on other computers, so it is not related to the state of the filesystem, or the path name.  The following command narrowed down the match a little:

c:\temp\tds>dir *d337*
Volume in drive C is OS
Volume Serial Number is 8036-7CEB

Directory of c:\temp\tds

12/05/2011  02:39 PM        12,432,384 tds-8.0.324.0.exe
1 File(s)     12,432,384 bytes
0 Dir(s)  12,700,200,960 bytes free

But the following did not match:

c:\temp\tds>dir td337*
Volume in drive C is OS
Volume Serial Number is 8036-7CEB

Directory of c:\temp\tds

File Not Found

Curiouser and curiouser.  Procmon did not provide much insight into the issue:

procmon

Where to now?  It’s Sunday afternoon, and I don’t feel like breaking out a kernel debugger to trace that any further right now.

Update 7:30pm: Well, I was puzzled so I did a little more research.  And ran across a mention of 8.3 filenames.  All of a sudden everything clicked into place.

D:\...\exe\tds>dir /x *337*
Volume in drive D has no label.
Volume Serial Number is 4279-DECE

Directory of D:\...\exe\tds

05/12/2011  02:39 PM        12,432,384 TDD337~1.EXE tds-8.0.324.0.exe
11/04/2011  02:41 PM        13,268,480 TDD938~1.EXE tds-8.0.337.0.exe
2 File(s)     25,700,864 bytes
0 Dir(s)  129,670,885,376 bytes free

Yes, even today DOS comes back to bite us.  So just beware when doing wildcard matches — especially with that old favourite del.

The case of the terribly slow PayPal emails

Every now and then I receive a payment via PayPal.  That’s not unusual, right?  PayPal would send me an email notifying me of the payment, and I’d open up Outlook to take a look at it.  All well and good.  In the last week, however, something changed.  When I clicked on any PayPal email, Outlook would take the best part of a minute to open the email, and what’s more, would freeze its user interface entirely while doing this.  But this was only happening with emails from PayPal — everything else was fine.

Not good.  At first I suspected an addin was mucking things up, so I disabled all the Outlook addins and restarted Outlook for good measure.  No difference.  Now I was getting worried — what if this was some unintended side-effect of some malware that had somehow got onto my machine, and it was targeting PayPal emails?

So I decided to do some research.  I fired up SysInternals’ Process Monitor, set it up to show only Outlook in its filtering, and turned on the Process Monitor trace.

Process Monitor filter window – filtering for OUTLOOK.EXE

Then I went and clicked on a PayPal email.  Waited the requisite time for the email to display, then went back to Process Monitor and turned off the trace.  I added the Duration column to make it easier to spot an anomalous entry.  This doesn’t always help but given the long delay, I was expecting some file or network activity to be taking a long time to run.

Adding the Duration column

Then scrolling up the log I quickly spotted the following entry.  It had a duration of nearly 3 seconds which stood out like a sore thumb.

The first offending entry

This entry was a Windows Networking connection to connect to a share on the remote host \\102.112.2o7.net.  This came back, nearly 3 seconds later, with ACCESS DENIED.  Then there were a bunch of follow up entries that related to this, all in all taking over 30 seconds to complete.  A quick web search revealed that this domain with its dubious looking name 102.112.2o7.net belongs to a well known web statistics company called Omniture.  That took some of the load off my mind, but now I was wondering how on earth opening a PayPal email could result in Internet access when I didn’t have automatic downloads of pictures switched on.

One of the emails that caused the problem, redacted of course 🙂

I opened the source of the PayPal HTML email and searched for “102.112“.  And there it was.

The HTML email in notepad

That’s a classic web bug.  Retrieving that image of a 1×1 pixel size, no doubt transparent, with some details encoded in the URL to record my visit to the web page (or in this case, opening of the email):

What was curious about this web bug was the use of the “//” shorthand to imply the same protocol (e.g. HTTP or HTTPS) as the base page.  That’s all well and good in a web browser, but in Outlook, the email is not being retrieved over HTTP.  So Outlook interprets this as a Windows Networking address, and attempts a Windows Networking connection to the host instead, \\102.112.2o7.net\b….

At this point I realised this could be viewed as a security flaw in Outlook.  So I wrote to Microsoft instead of publishing this post immediately.  Microsoft responded that they did not view this as a vulnerability (as it does not result in system compromise), but that they’d pass it on to the Outlook team as a bug.

Nevertheless, this definitely has the potential of being exploited for tracking purposes.  One important reason that external images are not loaded by default is to prevent third parties from being notified that you are reading a particular email.  While this little issue does not actually cause the image to load (it is still classified as an external image), it does cause a network connection to the third party server which could easily be logged for tracking purposes.  This network connection should not be happening.

So what was my fix?  Well, I don’t really care about Omniture or whether PayPal get their statistics, so I added an entry in my hosts file to block this domain.  Using an invalid IP address made it fail faster than the traditional use of 127.0.0.1:

0.0.0.0    102.112.2o7.net

And now my PayPal emails open quickly again.

Notes on MSHTML editing and empty elements

We ran into a problem recently with the MSHTML editor where empty paragraphs would collapse when the user saved or printed the document.  If  the user loaded the document again, the empty elements seem to disappear entirely.  Logically, this makes sense: an empty element has 0x0 dimensions, so will take up no space.  But if the user adds a blank line, they would not expect it to disappear after saving: MSHTML gives these new empty elements dimensions as if they contained a non breaking space, until the document is saved and reloaded.

Let’s look at that visually.  We’d type ONETWOTHREE into the editor, and all would be fine, as shown on the left.  After reloading, it would display as shown on the right:

Source HTML editor and collapsing HTML result

So what is going on, and what’s the solution?

The basic solution is to add an   entity (non-breaking space) to empty elements to give them a non-zero width and height.  MSHTML does this.  But there’s a heap of complexity around this.  From my analysis of the problem, it seems that the core issue is that element.innerHTML or domnode.childNodes.length is returning “” or 0 respectively when viewing the full source shows that there is actually an   in the element.  This happens only when the MSHTML editor is active.

The complexity arises when one looks at the different edit modes and methods of loading documents, because each mode has slightly different symptoms.  Whilst juggling the tangle of symptoms that these issues present, we also need to consider the following requirements:

  1. A document may contain both empty

    elements and

     

    elements, and the editor must not conflate the two when loading.

  2. When the user inserts a blank line, it must not collapse.  Nevertheless, the user does not want to learn about non-breaking spaces, so the editor must transparently manage this. Ideally, the non-breaking space would be hidden from the user but managed in the back end.  I must say that MSHTML is very close to this ideal.

A diversion: this is not a new problem, and many solutions have been proposed. One solution suggested in various forums online is to use
to break lines instead of the paragraph model with

or

.  This is not a great answer: it means the whole document has a single paragraph style.  I do note however that this is what Blogger and some other blog editors do, but then they do dynamically insert a

when the user changes the paragraph style.  Still not pretty.

Now where I talk about

elements, feel free to imagine that I am talking about

elements.  The behaviour appears to be much the same, and there’s just a flag to switch between the two elements. Also, I’ll be using Delphi for the code samples because it hides a lot of the necessary COM guff and makes the examples much easier to read.

When a new blank line is inserted into the editor, behind the scenes MSHTML will add an   entity to prevent the element from collapsing.  When you type the first letter, the   is deleted.  MSHTML also makes the   itself invisible to the user.  This is great.  It’s exactly what we want.

So let’s look at the activation of the editor and what is happening there. It turns out that there are three ways of making a document editable — four if you include the undocumented IDM_EDITMODE command that some editor component wrappers use.  So what are these four methods?

  1. Set document.designMode to “On”.

      D := WebBrowser.Document as IHTMLDocument2;
      D.designMode := ‘On’;
     

  2. Set document.body.contentEditable to “true” (or anyElement.contentEditable).

      D := WebBrowser.Document as IHTMLDocument2;
      (D.body as IHTMLElement3).contentEditable := ‘true’;
     

  3. The DISPID_AMBIENT_USERMODE ambient property.  See the link for an example.
     
  4. The aforementioned IDM_EDITMODE command ID.  I’m not condoning this method, just documenting it because some editor wrappers use it.

      D := WebBrowser.Document as IHTMLDocument2;
      (D as IOleWindow).GetWindow(hwnd_);

      SendMessage(hwnd_, WM_COMMAND, IDM_EDITMODE, 0);

To make things even more complicated, there are different ways of loading content into the HTML editor, and different methods have different outcomes.  The three methods we explored were using Navigate, document.write, and IPersistFile.

  1. Using editor.Navigate to load either a local or remote document.

      WebBrowser.Navigate(DocFileName);
     

  2. Using document.write to write a complete document.

      D := WebBrowser.Document as IHTMLDocument2;
      VarArray := VarArrayCreate([0, 0], varVariant);
      VarArray[0] := DocText;
      D.write(PSafeArray(TVarData(VarArray).VArray));
      D.close;
     

  3. Accessing the editor’s IPersistFile interface to load a document.

      D := WebBrowser.Document;
      PersistFile := D as IPersistFile;
      PersistFile.Load(PWideChar(DocFileName), 0);

It turns out that if you use either Navigate or contentEditable, then MSHTML will not hide   from the end user for elements already in the document.  New empty elements typed by the user will still have the default behaviour described previously.  This is inconsistent and confusing to both me (the poor developer) and the end user.

The following table shows how   are treated in otherwise empty elements when loaded in the various ways:

Navigate document.write IPersistFile
designMode
visible
invisible
invisible
contentEditable
visible
visible
visible
IDM_EDITMODE
visible
invisible
invisible
DISPID_AMBIENT_USERMODE
not tested

Now, it turns out that the visible results on that matrix are not going to do what we are looking for.  That’s because, as mentioned, any new empty elements typed by the user will still have the invisible behaviour for that empty element  .

Here’s a final example to clarify the situation.  Given the following document loaded into the MSHTML editor using document.write, and designMode = “On”.

 

 

With this result, we get the following results.

Code Result
doc.getElementById(“a”).innerHTML
doc.getElementById(“a”).outerHTML
doc.getElementById(“a”).parentElement.innerHTML
 
doc.getElementById(“a”).parentElement.outerHTML
 
doc.documentElement.outerHTML


 

So. How do we determine if an element is empty and collapsed, or is a blank line?  MSHTML isn’t consistent with its innerHTML, outerHTML or DOM text node properties of the element.

But wait, all hope is not lost!  It turns out that IHTMLElement3 has a little buried property called inflateBlock.  This property tells you whether or not an empty element will be ‘inflated’ to appear as though it has content.  This little known property (I found no discussions or blogs about it!) should solve our problem neatly:

isElementTrulyEmpty := (element.innerHTML = ”) and not (element as IHTMLElement3).inflateBlock;

isElementJustABlankLine := (element.innerHTML = ‘ ‘) or ((element.innerHTML = ”) and (element as IHTMLElement3).inflateBlock);

Now I just have to push this fix into the HTML editor component wrapper we are using.  At least I’ve already written the documentation around the fix!

Final note: even the Blogger editor that I’m using to write this post has trouble with consistency with new lines.  Here’s an example — look at the spacing around the code samples.

A screenshot of this blog post, in the editor (Firefox)

A screenshot of this blog post, previewing (Firefox)

Why you should not use MoveFileEx with MOVEFILE_DELAY_UNTIL_REBOOT

A common problem that you, as a developer, may run into on Windows is the need to replace a file that is in use. This commonly happens with installations and upgrades, but can of course also happen in general use.

In earlier versions of Windows, when most users worked in full administrator mode, the MoveFileEx function with the MOVEFILE_DELAY_UNTIL_REBOOT flag was suggested by Microsoft as a great approach for updating files that were in use.  This flag would, as it sounds, allow you to schedule the move or deletion of a file at a time when it was (pretty much) guaranteed to succeed.

For example:

// This will delete c:\temp\coolcorelibrary.dll on the next reboot
MoveFileEx(“c:\\temp\\coolcorelibrary.dll”, NULL, MOVEFILE_DELAY_UNTIL_REBOOT);

Nowadays, of course, this flag does not work unless you are running in the context of an administrative user.  That’s great, you think, this will still work for my install or upgrade program.

But don’t trust that feeling of security.  Things are never as easy as they seem.  I first realised that this was a problem when researching issues occasionally reported by some of our Keyman Desktop users.

Take this scenario:

  1. A user, Joe, decides to uninstall your awesome app CoolCoreProgram.
  2. The uninstaller finds that a critical file (let’s call it coolcorelibrary.dll) is in use and can’t delete it
  3. Installer calls a MoveFileEx with MOVEFILE_DELAY_UNTIL_REBOOT to schedule deletion of coolcorelibrary.dll.
  4. Would you click Restart Now?  Why not leave it till later?

    The uninstall completes and presents Joe with the dreaded “Hey, you need to restart Windows now” dialog.

  5. Poor unhappy Joe swears and cancels the restart, and continues his work.  He can’t see any good reason to restart Windows…
  6. A short while later, Joe realises that he actually loves CoolCoreProgram and so he downloads and reinstalls the latest, greatest version (now with extra shiny!)
  7. Shortly thereafter, Joe finishes up for the day and turns off his computer.
  8. The next morning, after Joe starts up his computer, Windows notes its instructions from the previous day, and obediently deletes coolcorelibrary.dll.
  9. And now Joe is now really, really unhappy when he tries to start CoolCoreProgram and he gets a bizarre coolcorelibrary.dll missing error.

Who is Joe going to blame?  Is it his fault for not restarting?  Is it yours for using cool APIs such as MoveFileEx?  Or is it a woeful confluence of unintended consequences?

This is probably one of the simplest scenarios in which this problem can crop up.  Things get much worse when you talk about shared libraries, fonts, or other resources which may be in use by the system, or multi-user systems.

Some reasons I have encountered for files in use:

  1. Program is still running (duh!)
  2. System has locked the file (common with fonts, hook libraries)
  3. Antivirus or security software is scanning the file
  4. Another application is trying to update the file (i.e. don’t run multiple installers at once)

One possible fix would be to check and update the registry key
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\PendingFileRenameOperations for listed file before creating or updating them.  But that’s a big performance and reliability hit.  I can’t say that solution sits well with me.

Another idea is to block further install and upgrade tasks until Joe actually does restart his computer, for example with a registry setting or a RunOnce entry.  But that’s going to make Joe hate me even more than he already does!

Some scenarios can be fixed with the Windows Installer automatic repair functionality.  But that assumes access to the original source files is always possible.  So that’s not a general solution.

I don’t really have a good general solution to this problem.  Any ideas?

Problems with Internet Explorer 8, print templates and standards compliance

The print engine that we use for one product I work with is based on Internet Explorer’s custom print templates functionality.  This actually works really well, and gives us lots of flexibility and generally is pretty straightforward to use.  Unfortunately, we have run into a bit of a problem recently when trying to print documents in IE8 standards compliance mode: multiple page documents print blank pages, are missing content or sometimes even fail to start printing.

I have been able to reproduce this issue using Microsoft’s own printtemplates.exe example (link in Introduction of article), with only a couple of minor changes to trigger the problem.

Symptoms
The issue arises when all of the following specific conditions are met:

  1. The document to be printed is in IE8 standards compliance mode.  This is mostly easily forced with the X-UA-Compatible META element:
    <META HTTP-EQUIV='X-UA-Compatible' CONTENT='IE=8' />
    
  2. The document to be printed is greater than 1 page long.
  3. The print template uses LAYOUTRECT elements of differing sizes.  This is a common requirement for letters which may have an address section or expanded letterhead at the top of the first page, but will have a much smaller letterhead on subsequent pages.

The Baseline
We’ll use the printtemplates.exe sample provided by Microsoft.  The sample Template3.htm in the example shows how to dynamically create LAYOUTRECT elements.  Here’s what happens before we make any changes to the example:

Print Template sample application, working with defaults
Print Preview, defaults, page 1
Print Preview, defaults, page 2

As shown in the screen shots, the print preview displays as expected.

Reproducing the Problem

With Internet Explorer 8, the problem can be reproduced as follows:

  1. Start printtemplates.exe, and select Template3.htm.
  2. Click the Page Source button.  In the HTML file that is opened, add the META element as the first child of the HEAD element.  This forces the page into IE8 standards compliant mode.  Save the file as sample.htm:
    <HTML>
      <HEAD>
        <META HTTP-EQUIV='X-UA-Compatible' CONTENT='IE=8' />
        <TITLE>Print Template Samples</TITLE>
      </HEAD>
    
  3. Back in printtemplates, click the Template Source button.  In the HTML file that is opened, make the highlighted change to the OnRectComplete function.  This small change means pages other than the first page have a smaller LAYOUTRECT than the initial page.  Save this file as template.htm.
    function OnRectComplete()
    {
        if (event.contentOverflow == true)
        {
            document.all("layoutrect" + (iNextPageToCreate - 1)).onlayoutcomplete = null;
    
            newHTML  = "";
            newHTML += "STYLE='height:5.5in'/>";
            newHTML += "";
    
            pagecontainer.insertAdjacentHTML("beforeEnd", newHTML);
            iNextPageToCreate++;
        }
    }
  4. Enter the full paths of sample.htm and template.htm into the respective fields in the print template sample, then press ENTER in the page field to load the modified sample.htm.

When you press Print Preview now, the print preview window will show blank pages instead of the expected content, and will have only 2 pages instead of the 5 or more that were shown previously:

The print template application with modified page and template ready to roll
Failing print preview, page 1
Failing print preview, page 2

If either of the changes are removed from the sample.htm and template.htm, the problem does not occur.  In Internet Explorer 9, the problem also occurs but appears to be resolved when using IE=9 in the META tag.  However, as Internet Explorer 9 is not available for Windows XP, this is not a viable solution for us.

Workarounds
We’ve found a few workarounds.  None of these are very viable for us but I list them for completeness:

  1. Don’t use LAYOUTRECT elements of differing sizes.  This is a non-starter for us.
  2. Don’t use IE8 standards compliant mode.  This would mean rewriting a number of reports and some complicated CSS to workaround bugs in IE7’s standards compliant mode, but may be a way forward.  Or use quirks mode with all the joy that brings.
  3. Use IE9.

Demystifying printing with the Microsoft WebBrowser control and ShowHTMLDialogEx

I’m writing up these notes in order to document what has been a long and painful process, involving much spelunking through MSHTML.DLL and IEFRAME.DLL to try and understand what Internet Explorer (or more accurately, the WebBrowser control) is doing and how to correctly use the semi-documented interfaces to provide full control over a print job.

The original requirement for this mini-project was to provide tray, collation, and duplex control for a HTML print job using IHTMLDocument2.execCommand(IDM_PRINT), with a custom print template.  These functions had been supported through a 3rd party ActiveX component, but this component proved to be incompatible with Internet Explorer 9 (causing a blue screen would you believe!), and the company providing the component was defunct, so it fell to me to re-engineer the solution.

After considerable research, I found some sparse documentation on MSDN suggesting that one could pass a HTMLDLG_PRINT_TEMPLATE flag to ShowHTMLDialogExand thereby duplicate and extend the functionality of the print template.  In particular, the __IE_CMD_Printer_Devmode property that could somehow be passed into this function would give us the ability to control anything we liked in terms of the printer settings.

Too easy.  Much too easy.  The first stumbling block was trying to discover the type of the pvarArgIn parameter to ShowHTMLDialogEx. A variant array seemed sensible but did not work.  It turns out that this needs to be an IHTMLEventObj, which can be created with IHTMLDocument4.CreateEventObject.  You can then use IHTMLEventObj2.setAttribute to set the various attributes for the object.

Then there were questions about what IMoniker magic was needed for the pMk parameter.  And more questions about the most appropriate set of flags.  Diving into the debugger to examine what Microsoft did answered both of these questions — it was a simple CreateURLMonikerEx call, no need to bind the moniker or other magic, and the flags that Microsoft used were HTMLDLG_ALLOW_UNKNOWN_THREAD or HTMLDLG_NOUI or HTMLDLG_MODELESS or HTMLDLG_PRINT_TEMPLATE for a print job, or HTMLDLG_ALLOW_UNKNOWN_THREAD or HTMLDLG_MODAL or HTMLDLG_MODELESS or HTMLDLG_PRINT_TEMPLATE for a print preview job.  Yes, that is both HTMLDLG_MODAL and HTMLDLG_MODELESS!

Next, what variant type should the __IE_BrowseDocument attribute be?  VT_DISPATCH or VT_UNKNOWN?  The answer is VT_UNKNOWN — things just won’t work if you pass a VT_DISPATCH.  I also came unstuck on the __IE_PrinterCmd_DevMode and __IE_PrinterCmd_DevNames attributes.  These need to be a VT_I4 containing an unlocked HGLOBAL that references a DEVMODEW structure.  I’ll leave the setup of the DEVMODEW structure to you: there are a lot of examples of that online.

However, even after overcoming these hurdles (with copious debugging to understand what MSHTML.DLL and IEFRAME.DLL were doing), there were other issues.  First, the print template was unable to access the dialogArguments.__IE_BrowseDocument property, with an Access Denied error thrown.  Also, HTC behaviors would fail to load as the WebBrowser component believed that they were being referenced in an insecure, cross-domain manner.  And finally, JavaScript in the page being printed was failing to run — and this JavaScript was required to render some of the details of the page.

I knew that Microsoft actually pass a reference to a temporary file for printing in the __IE_ContentDocumentURL attribute.  So I saved the file to a temporary file, which also required adding a BASE element to the header so that relative URLs in the document would resolve.  But the problems had not gone away.

All three of these problems in reality stemmed from the same root cause.  The security IDs for the various elements — the print template, the document being printed, and the HTC components — were not matching.  So I embarked on an attempt to find out why.  At first I wondered if we needed to bind the moniker to a bind context or storage.  That was a no-go.  Then I looked at the IInternetSecurityManager interface, which a developer can implement to provide custom security IDs, zones and more.  Sounds logical, right?  Only problem is that the ShowHTMLDialogEx function provides its own IInternetSecurityManager implementation, which you cannot override (and its GetSecurityID just returns INET_E_DEFAULTACTION for the relevant URLs).  Yikes.

I was starting to run out of options.  As far as I could tell, we were duplicating Microsoft’s functionality essentially identically, and I could not see any calls which changed the security for the document so that it would match security contexts.

Finally I noticed an undocumented attribute had been added to the HTML element in the temporary copy of the page: __IE_DisplayURL.  And as soon as I added that to my file, referencing the original URL of the document, everything worked!

Now, this is all fun (and sounds straightforward in hindsight), but without some code it’s probably not terribly helpful.  So here’s some code (in Delphi, translate to your favourite language as required).  It all looks pretty straightforward now(!), but nearly every line involved blood, sweat and tears!  This is really not a complete example and hence does not compile but just covers the bits necessary to complement the better documented aspects of custom printing with MSHTML.  Please note that this example uses the TEmbeddedWB component for Delphi, and that temporary file cleanup has been excluded.

procedure THTMLPrintController.StartPrint(FPrint: Boolean);
var
  FDeviceW, FDriverW, FPortW: WideString;
  FDevModeHandle, FDevNamesHandle: HGLOBAL;
  pEventObj2: IHTMLEventObj2;
    procedure SetTempFileName;
    begin
      FTempFileName := GetTempFileName('', '.htm');
    end;
    { SaveToFile: Saves the current web document to a temporary file, adding the required BASE and HTML properties }
    procedure SaveToFile;
    var
      FElementCollection: IHTMLElementCollection;
      FHTMLElement: IHTMLElement;
      FBaseElement: IHTMLBaseElement;
      FString: WideString;
    begin
      FElementCollection := webBrowser.Doc3.getElementsByTagName('base');
      if FElementCollection.length = 0 then
      begin
        FBaseElement := webBrowser.Doc2.createElement('base') as IHTMLBaseElement;
        FBaseElement.href := webBrowser.LocationURL;
        (webBrowser.Doc3.getElementsByTagName('head').item(0,0) as IHTMLElement2).insertAdjacentElement('afterBegin', FBaseElement as IHTMLElement);
      end
      else
      begin
        FBaseElement := FElementCollection.item(0,0) as IHTMLBaseElement;
        if FBaseElement.href = '' then FBaseElement.href := webBrowser.LocationURL;
      end;
      FElementCollection := webBrowser.Doc3.getElementsByTagName('html');
      if FElementCollection.length > 0 then
      begin
        FHTMLElement := FElementCollection.item(0,0) as IHTMLElement;
        FHTMLElement.setAttribute( '__IE_DisplayURL', webBrowser.LocationURL, 0);
      end;
      with TFileStream.Create(FTempFileName, fmCreate) do
      try
        if webBrowser.Doc5.compatMode = 'CSS1Compat' then
        begin
          FString := '';
          Write(PWideChar(FString)^, Length(FString)*2);
        end;
        FString := webBrowser.Doc3.documentElement.outerHTML;
        Write(PWideChar(FString)^, Length(FString)*2);
      finally
        Free;
      end;
    end;
    { Configured the printer, assuming we've already been passed an ANSI handle }
    procedure ConfigurePrinter;
    var
      FDevice, FDriver, FPort: array[0..255] of char;
      FDevModeHandle_Ansi: HGLOBAL;
      FPrinterHandle: THandle;
      FDevMode: PDeviceModeW;
      FDevNames: PDevNames;
      FSucceeded: Boolean;
      sz: Integer;
      Offset: PChar;
    begin
      Printer.GetPrinter(FDevice, FDriver, FPort, FDevModeHandle_Ansi);
      if FDevModeHandle_Ansi = 0 then
        RaiseLastOSError;
      FDeviceW := FDevice;
      FDriverW := FDriver;
      FPortW := FPort;
      { Setup the DEVMODE structure }
      FSucceeded := False;
      if not OpenPrinterW(PWideChar(FDeviceW), FPrinterHandle, nil) then
        RaiseLastOSError;
      try
        sz := DocumentPropertiesW(0, FPrinterHandle, PWideChar(FDeviceW), nil, nil, 0);
        if sz < 0 then RaiseLastOSError;
        FDevModeHandle := GlobalAlloc(GHND, sz);
        if FDevModeHandle = 0 then RaiseLastOSError;
        try
          FDevMode := GlobalLock(FDevModeHandle);
          if FDevMode = nil then
            RaiseLastOSError;
          try
            if DocumentPropertiesW(0, FPrinterHandle, PWidechar(FDeviceW), FDevMode, nil, DM_OUT_BUFFER) < 0 then
              RaiseLastOSError;
            FDevMode.dmFields := FDevMode.dmFields or DM_DEFAULTSOURCE or DM_DUPLEX or DM_COLLATE;
            FDevMode.dmDefaultSource := FTrayNumber;
            if FDuplex
              then FDevMode.dmDuplex := DMDUP_VERTICAL
              else FDevMode.dmDuplex := DMDUP_SIMPLEX;
            if FCollate
              then FDevMode.dmCollate := DMCOLLATE_TRUE
              else FDevMode.dmCollate := DMCOLLATE_FALSE;
            if DocumentPropertiesW(0, FPrinterHandle, PWideChar(FDeviceW), FDevMode, FDevMode, DM_OUT_BUFFER or DM_IN_BUFFER) < 0 then
              RaiseLastOSError;
            FSucceeded := True;
          finally
            GlobalUnlock(FDevModeHandle);
          end;
        finally
          if not FSucceeded then GlobalFree(FDevModeHandle);
        end;
      finally
        ClosePrinter(FPrinterHandle);
      end;
      Assert(FSucceeded);
      { Setup up the DEVNAMES structure }
      FSucceeded := False;
      FDevNamesHandle := GlobalAlloc(GHND, SizeOf(TDevNames) +
       (Length(FDeviceW) + Length(FDriverW) + Length(FPortW) + 3) * 2);
      if FDevNamesHandle = 0 then RaiseLastOSError;
      try
        FDevNames := PDevNames(GlobalLock(FDevNamesHandle));
        if FDevNames = nil then RaiseLastOSError;
        try
          Offset := PChar(FDevNames) + SizeOf(TDevnames);
          with FDevNames^ do
          begin
            wDriverOffset := (Longint(Offset) - Longint(FDevNames)) div 2;
            Move(PWideChar(FDriverW)^, Offset^, Length(FDriverW) * 2 + 2);
            Inc(Offset, Length(FDriverW) * 2 + 2);
            wDeviceOffset := (Longint(Offset) - Longint(FDevNames)) div 2;
            Move(PWideChar(FDeviceW)^, Offset^, Length(FDeviceW) * 2 + 2);
            Inc(Offset, Length(FDeviceW) * 2 + 2);
            wOutputOffset := (Longint(Offset) - Longint(FDevNames)) div 2;
            Move(PWideChar(FPortW)^, Offset^, Length(FPortW) * 2 + 2);
          end;
          FSucceeded := True;
        finally
          GlobalUnlock(FDevNamesHandle);
        end;
      finally
        if not FSucceeded then GlobalFree(FDevNamesHandle);
      end;
      Assert(FSucceeded);
    end;
  { Creates the IHTMLEventObj2 and populates the attributes for printing }
  procedure CreateEventObject;
  var
    v: OleVariant;
    FShortFileName: WideString;
    FShortFileNameBuf: array[0..260] of widechar;
  begin
    v := EmptyParam;
    pEventObj2 := webBrowser.Doc4.CreateEventObject(v) as IHTMLEventObj2;
    pEventObj2.setAttribute('__IE_BaseLineScale', 2, 0);
    GetShortPathNameW(PWideChar(FTempFileName), FShortFileNameBuf, 260); FShortFileName := FShortFileNameBuf;

    v := webBrowser.Document as IUnknown;
    pEventObj2.setAttribute('__IE_BrowseDocument', v, 0);
    pEventObj2.setAttribute('__IE_ContentDocumentUrl', FShortFileName, 0);
    pEventObj2.setAttribute('__IE_ContentSelectionUrl', '', 0);  // Empty as we never print selections
    pEventObj2.setAttribute('__IE_FooterString', '', 0);
    pEventObj2.setAttribute('__IE_HeaderString', '', 0);
    pEventObj2.setAttribute('__IE_ActiveFrame', 0, 0);
    pEventObj2.setAttribute('__IE_OutlookHeader', '', 0);
    pEventObj2.setAttribute('__IE_PrinterCMD_Device', FDeviceW, 0);
    pEventObj2.setAttribute('__IE_PrinterCMD_Port', FPortW, 0);
    pEventObj2.setAttribute('__IE_PrinterCMD_Printer', FDriverW, 0);
    pEventObj2.setAttribute('__IE_PrinterCmd_DevMode', FDevModeHandle, 0);
    pEventObj2.setAttribute('__IE_PrinterCmd_DevNames', FDevNamesHandle, 0);
    if FPrint
      then pEventObj2.setAttribute('__IE_PrintType', 'NoPrompt', 0)
      else pEventObj2.setAttribute('__IE_PrintType', 'Preview', 0);
    pEventObj2.setAttribute('__IE_TemplateUrl', GetPrintTemplateURL, 0);
    pEventObj2.setAttribute('__IE_uPrintFlags', 0, 0);
    v := VarArrayOf([FShortFileName]);
    pEventObj2.setAttribute('__IE_TemporaryFiles', v, 0);
    pEventObj2.setAttribute('__IE_ParentHWND', 0, 0);
    pEventObj2.setAttribute('__IE_HeaderString', webBrowser.Doc2.title, 0);
    pEventObj2.setAttribute('__IE_DisplayURL', webBrowser.LocationURL, 0);
  end;
  procedure InstantiateDialog;
  var
    FWindowParams, FMonikerURL: WideString;
    FMoniker: IMoniker;
    FDialogFlags: DWord;
    varArgIn, varArgOut: OleVariant;
    res: HRESULT;
  begin
    varArgIn := pEventObj2 as IUnknown;
    varArgOut := Null;
    FMonikerURL := GetPrintTemplateURL;
    OleCheck(CreateURLMonikerEx(nil, PWideChar(FMonikerURL), FMoniker, URL_MK_UNIFORM));
    if FPrint then
    begin
      FWindowParams := '';
      FDialogFlags := HTMLDLG_ALLOW_UNKNOWN_THREAD or HTMLDLG_NOUI or HTMLDLG_MODELESS or HTMLDLG_PRINT_TEMPLATE;
    end
    else
    begin
      FWindowParams := 'resizable=yes;';
      FDialogFlags := HTMLDLG_ALLOW_UNKNOWN_THREAD or HTMLDLG_MODAL or HTMLDLG_MODELESS or HTMLDLG_PRINT_TEMPLATE;
    end;
    res := ShowHTMLDialogEx(0, FMoniker, FDialogFlags, varArgIn, PWideChar(FWindowParams), varArgOut);
    if res <> S_OK then raise EOSError.Create(SysErrorMessage(res));
  end;
begin
  SetTempFileName;
  SaveToFile;
  ConfigurePrinter;
  CreateEventObject;
  InstantiateDialog;
end;

Update 14 July: This code is not our production code: I’ve stripped out bits and pieces and tried to keep the bits that are somewhat relevant. Don’t worry too much about the ConfigurePrinter details — the takeaway is the HGLOBAL. I must also apologise for the atrocity that is the SaveToFile function. That’s what you get when working with legacy versions of software. Internet Explorer also won’t reliably work with non-ASCII content there unless you toss a BOM into the start of the stream.

Fixing Windows font scaling without restarting

Windows 7 and Windows Server 2008 include the ability for each user to set their font scale. This is fantastic, except for a legacy complication: the old bitmap fonts MS Sans Serif, MS Serif and Courier have specific versions for each font scale, but these are never changed after Windows is installed. In previous versions of Windows, the fonts were replaced with the correct versions for the selected font scale, which is why a system restart was required

This means that these bitmap fonts can be out of sync with the currently selected font scale. This is typically only a problem for legacy applications, but it is ugly in those cases!

More background is available at the MSDN blog http://blogs.msdn.com/b/developingfordynamicsgp/archive/2009/11/25/windows-7-bitmap-fonts-and-microsoft-dynamics-gp.aspx and the follow-up post http://blogs.msdn.com/b/developingfordynamicsgp/archive/2009/12/02/more-on-windows-7-bitmap-fonts-and-dpi-settings.aspx

In our situation, it was even worse: the client was running a Remote Desktop Services environment, where restarting the server was really out of the question.

So I wrote a little fix-it app that dynamically adjusts all the font scaling registry settings and installs the correct fonts for the selected font scale.  You may need to log off and log on again, but in most cases, no restart is required.  It is setup for 100% and 125% only, and I provide this app here only as a useful tool.  No support or warranties, etc, etc.  Use at your own risk!

Download Fontsizefix.zip.

Update 1 Jul: As I discussed this blog with Peter Constable, I realised that I didn’t really describe what the tool did.  So: fontsizefix updates the various metrics in HKCU\Control Panel\Desktop, and a couple of LogPixels registry settings in HKLM\SYSTEM\CurrentControlSet\Hardware Profiles\CurrentSoftware\Fonts and HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\FontDPI\LogPixels, updates the fonts key in the registry to point to the correct versions of MS Sans Serif, MS Serif and Courier, and then RemoveFontResource and AddFontResource in order to get the correct version of the font loaded.  I’m sure it’s not 100% but it got us over a hurdle with the terminal services environment.  For purposes of support, it was easiest to make a tool that did the whole lot rather than document a bunch of registry tweaks which are easy to trip over on, and then we figured we might as well make it available to other users as well…

The case of the hidden exception

I was building a new installer for an application today using WiX v3, and ran into a problem when trying to extract the COM SelfReg from a DLL using heat.

The error that heat returned was not very helpful.

heat.exe : warning HEAT5150 : Could not harvest data from a file that was expected to be a SelfReg DLL: PrinterBvr.dll. If this file does not support SelfReg you can ignore this warning. Otherwise, this error detail may be helpful to diagnose the failure: Exception has been thrown by the target of an invocation..

No tracing options were available in heat to figure out what actually caused the exception or even what the exception was! Time to pull out one of my favourite debugging tools: Procmon. Procmon traces activities on your system and records registry, file, network and similar activities to a log file.

So, I ran Procmon and added a filter to restrict the log to processes named “heat.exe”:

Then I started the trace and re-ran my command. It failed (as expected) with the same error. I came back to Procmon, stopped the trace, and scanned through looking for a failure point. Near the bottom of the trace, I found the issue. Part of the work heat does is to redirect registry actions to a special location in the registry so it can capture the work the DLL’s DllRegisterServer call does. Unfortunately, printerbvr.dll was depending on a specific key in HKLM already existing, which of course it did not in the redirected registry keys.

At this point I was able to understand the cause of the problem. But how could I work around this to get heat to do its work? I only needed to do this once to capture the registry settings. I could have used Procmon to watch all the activity and manually built the .wxs source file from that. But I was loath to do so much work.

Looking at the trace a bit closer I noted that the key that heat used as a temporary root for its registry redirection was based on its process ID. So I couldn’t create the keys beforehand because I wouldn’t know what the process ID was until after heat had already started — and the process only took a fraction of a second to run.

Windbg to the rescue! I fired up windbg, and started a new debug session with heat.exe and the appropriate command line parameters for heat. I didn’t want to create the registry key based on the PID before heat did that itself because I figured heat might complain if the registry key was already there. So I quickly peeked at the handy stack trace that Procmon had captured for the event in question:

From that I discerned that I could add a breakpoint when PrinterBvr.dll + 0x18BB (subtracting 6 bytes for the size of the ‘call’ opcode), and continued execution. I could probably have added the breakpoint at printerbvr.dll!DllRegisterServer, but this worked for me:

bu printerbvr.dll+0x18bb
g

At the right time, we hit the breakpoint and I then jumped into the registry (after quickly looking up the PID for this instance of heat.exe) and created my new registry key:

This time, everything completed happily, I had my .wxs output file, and I was able to get on with some real work (i.e. writing this blog post).