Category Archives: Development

Comparing TStringStream vs TStringList for writing Unicode strings to streams

There are two methods widely used in Delphi code for reading and writing strings to/from streams with Delphi, that initially seem pretty similar in their behaviour.  These are TStrings.SaveToStream and TStringStream.SaveToStream (or SaveToFile in either case):

procedure TForm1.SaveToFile;
const
  AString = 'This is some Unicode text'#13+
            'Test Unicode © Δ א';
begin
  with TStringList.Create do
  try
    Text := AString;
    SaveToFile('TStringList UTF8.txt', TEncoding.UTF8);
  finally
    Free;
  end;

  with TStringStream.Create(AString, TEncoding.UTF8) do
  try
    SaveToFile('TStringStream UTF8.txt');
  finally
    Free;
  end;
end;

But there are several crucial differences in what is written to the stream between these two methods:

  1. TStringList prepends the preamble bytes for the encoding (in this case, #$EF#$BB#$BF)
  2. TStringList appends a new line #$0D#$0A to the file, if your text does not already end in a new line.
  3. TStringList converts any single line breaking characters in the text (e.g. #$0D or #$0A) into #$0D#$0A.

The following hex dumps may show this more clearly:

EF BB BF 54 68 69 73 20 69 73 20 73 6F 6D 65 20
55 6E 69 63 6F 64 65 20 74 65 78 74 0D 0A 54 65
73 74 20 55 6E 69 63 6F 64 65 20 C2 A9 20 CE 94
20 D7 90 0D 0A 

TStringList UTF8.txt

54 68 69 73 20 69 73 20 73 6F 6D 65 20 55 6E 69
63 6F 64 65 20 74 65 78 74 0D 54 65 73 74 20 55
6E 69 63 6F 64 65 20 C2 A9 20 CE 94 20 D7 90

TStringStream UTF8.txt

Make sure you know how your files will be read and whether these differences are important to the target application.

Basically, TStringList is typically not appropriate for streaming strings without modification.  TStringStream is your friend here.  But if you need the preamble, and just the preamble, then you’ll have to do a little more work; you won’t be able to use TStringStream.SaveToFile.

IXMLDocument.SaveToStream does not always use UTF-16 encoding

Delphi’s documentation on IXMLDocument.SaveToStream has the following important caveat:

Regardless of the encoding system of the original XML document, SaveToStream always saves the stream in UTF-16.

It’s helpful to have notes like this.  Mind you, forcing UTF-16 output is definitely horrible; what if we need our document in UTF-8 or (God forbid) some non-Unicode encoding?

Now Kris and I were looking at a Unicode corruption issue with an XML document in a Delphi application and struggling to understand what was going wrong given this statement in the documentation.  Our results didn’t add up, so we wrote a little test app to test that statement:

procedure TForm1.OutputEncodingIsUTF8;
const
  UTF8XMLDoc: string =
    '<!--?xml version="1.0" encoding="utf-8"?-->'#13#10+
    '';
var
  XMLDocument: IXMLDocument;
  InStream: TStringStream;
  OutStream: TFileStream;
begin
  // stream format is UTF8, input string is converted to UTF8
  // and saved to the stream
  InStream := TStringStream.Create(UTF8XMLDoc, TEncoding.UTF8);</blockquote>
  // we'll write to this output file
  OutStream := TFileStream.Create('file_should_be_utf16_but_is_utf8.xml',
    fmCreate);
  try
    XMLDocument := TXMLDocument.Create(nil);
    XMLDocument.LoadFromStream(InStream);
    XMLDocument.SaveToStream(OutStream);
    // IXMLDocument.SaveToStream docs state will always be UTF-16
  finally
    FreeAndNil(InStream);
    FreeAndNil(OutStream);
  end;

  with TStringList.Create do
  try    // we want to load it as a UTF16 doc given the documentation
    LoadFromFile('file_should_be_utf16_but_is_utf8.xml', TEncoding.Unicode);
    ShowMessage('This should be displayed as an XML document '+
                'but instead is corrupted: '+#13#10+Text);
  finally
    Free;
  end;
end;

When I run this, I’m expecting the following dialog:

But instead I get the following dialog:

Note, this is run on Delphi 2010.  Haven’t run this test on Delphi XE2, but the documentation hasn’t changed.

The moral of the story is, the output encoding is the same as the input encoding, unless you change the output encoding with the Encoding property, for example, adding the highlighted line below fixes the code sample:

    XMLDocument := TXMLDocument.Create(nil);
    XMLDocument.LoadFromStream(InStream);
    XMLDocument.Encoding := 'UTF-16';
    XMLDocument.SaveToStream(OutStream);

The same documentation issue exists for TXMLDocument.SaveToStream.  I’ve reported the issue in QualityCentral.

Notes on MSHTML editing and empty elements

We ran into a problem recently with the MSHTML editor where empty paragraphs would collapse when the user saved or printed the document.  If  the user loaded the document again, the empty elements seem to disappear entirely.  Logically, this makes sense: an empty element has 0x0 dimensions, so will take up no space.  But if the user adds a blank line, they would not expect it to disappear after saving: MSHTML gives these new empty elements dimensions as if they contained a non breaking space, until the document is saved and reloaded.

Let’s look at that visually.  We’d type ONETWOTHREE into the editor, and all would be fine, as shown on the left.  After reloading, it would display as shown on the right:

Source HTML editor and collapsing HTML result

So what is going on, and what’s the solution?

The basic solution is to add an   entity (non-breaking space) to empty elements to give them a non-zero width and height.  MSHTML does this.  But there’s a heap of complexity around this.  From my analysis of the problem, it seems that the core issue is that element.innerHTML or domnode.childNodes.length is returning “” or 0 respectively when viewing the full source shows that there is actually an   in the element.  This happens only when the MSHTML editor is active.

The complexity arises when one looks at the different edit modes and methods of loading documents, because each mode has slightly different symptoms.  Whilst juggling the tangle of symptoms that these issues present, we also need to consider the following requirements:

  1. A document may contain both empty

    elements and

     

    elements, and the editor must not conflate the two when loading.

  2. When the user inserts a blank line, it must not collapse.  Nevertheless, the user does not want to learn about non-breaking spaces, so the editor must transparently manage this. Ideally, the non-breaking space would be hidden from the user but managed in the back end.  I must say that MSHTML is very close to this ideal.

A diversion: this is not a new problem, and many solutions have been proposed. One solution suggested in various forums online is to use
to break lines instead of the paragraph model with

or

.  This is not a great answer: it means the whole document has a single paragraph style.  I do note however that this is what Blogger and some other blog editors do, but then they do dynamically insert a

when the user changes the paragraph style.  Still not pretty.

Now where I talk about

elements, feel free to imagine that I am talking about

elements.  The behaviour appears to be much the same, and there’s just a flag to switch between the two elements. Also, I’ll be using Delphi for the code samples because it hides a lot of the necessary COM guff and makes the examples much easier to read.

When a new blank line is inserted into the editor, behind the scenes MSHTML will add an   entity to prevent the element from collapsing.  When you type the first letter, the   is deleted.  MSHTML also makes the   itself invisible to the user.  This is great.  It’s exactly what we want.

So let’s look at the activation of the editor and what is happening there. It turns out that there are three ways of making a document editable — four if you include the undocumented IDM_EDITMODE command that some editor component wrappers use.  So what are these four methods?

  1. Set document.designMode to “On”.

      D := WebBrowser.Document as IHTMLDocument2;
      D.designMode := ‘On’;
     

  2. Set document.body.contentEditable to “true” (or anyElement.contentEditable).

      D := WebBrowser.Document as IHTMLDocument2;
      (D.body as IHTMLElement3).contentEditable := ‘true’;
     

  3. The DISPID_AMBIENT_USERMODE ambient property.  See the link for an example.
     
  4. The aforementioned IDM_EDITMODE command ID.  I’m not condoning this method, just documenting it because some editor wrappers use it.

      D := WebBrowser.Document as IHTMLDocument2;
      (D as IOleWindow).GetWindow(hwnd_);

      SendMessage(hwnd_, WM_COMMAND, IDM_EDITMODE, 0);

To make things even more complicated, there are different ways of loading content into the HTML editor, and different methods have different outcomes.  The three methods we explored were using Navigate, document.write, and IPersistFile.

  1. Using editor.Navigate to load either a local or remote document.

      WebBrowser.Navigate(DocFileName);
     

  2. Using document.write to write a complete document.

      D := WebBrowser.Document as IHTMLDocument2;
      VarArray := VarArrayCreate([0, 0], varVariant);
      VarArray[0] := DocText;
      D.write(PSafeArray(TVarData(VarArray).VArray));
      D.close;
     

  3. Accessing the editor’s IPersistFile interface to load a document.

      D := WebBrowser.Document;
      PersistFile := D as IPersistFile;
      PersistFile.Load(PWideChar(DocFileName), 0);

It turns out that if you use either Navigate or contentEditable, then MSHTML will not hide   from the end user for elements already in the document.  New empty elements typed by the user will still have the default behaviour described previously.  This is inconsistent and confusing to both me (the poor developer) and the end user.

The following table shows how   are treated in otherwise empty elements when loaded in the various ways:

Navigate document.write IPersistFile
designMode
visible
invisible
invisible
contentEditable
visible
visible
visible
IDM_EDITMODE
visible
invisible
invisible
DISPID_AMBIENT_USERMODE
not tested

Now, it turns out that the visible results on that matrix are not going to do what we are looking for.  That’s because, as mentioned, any new empty elements typed by the user will still have the invisible behaviour for that empty element  .

Here’s a final example to clarify the situation.  Given the following document loaded into the MSHTML editor using document.write, and designMode = “On”.

 

 

With this result, we get the following results.

Code Result
doc.getElementById(“a”).innerHTML
doc.getElementById(“a”).outerHTML
doc.getElementById(“a”).parentElement.innerHTML
 
doc.getElementById(“a”).parentElement.outerHTML
 
doc.documentElement.outerHTML


 

So. How do we determine if an element is empty and collapsed, or is a blank line?  MSHTML isn’t consistent with its innerHTML, outerHTML or DOM text node properties of the element.

But wait, all hope is not lost!  It turns out that IHTMLElement3 has a little buried property called inflateBlock.  This property tells you whether or not an empty element will be ‘inflated’ to appear as though it has content.  This little known property (I found no discussions or blogs about it!) should solve our problem neatly:

isElementTrulyEmpty := (element.innerHTML = ”) and not (element as IHTMLElement3).inflateBlock;

isElementJustABlankLine := (element.innerHTML = ‘ ‘) or ((element.innerHTML = ”) and (element as IHTMLElement3).inflateBlock);

Now I just have to push this fix into the HTML editor component wrapper we are using.  At least I’ve already written the documentation around the fix!

Final note: even the Blogger editor that I’m using to write this post has trouble with consistency with new lines.  Here’s an example — look at the spacing around the code samples.

A screenshot of this blog post, in the editor (Firefox)

A screenshot of this blog post, previewing (Firefox)

Why you should not use MoveFileEx with MOVEFILE_DELAY_UNTIL_REBOOT

A common problem that you, as a developer, may run into on Windows is the need to replace a file that is in use. This commonly happens with installations and upgrades, but can of course also happen in general use.

In earlier versions of Windows, when most users worked in full administrator mode, the MoveFileEx function with the MOVEFILE_DELAY_UNTIL_REBOOT flag was suggested by Microsoft as a great approach for updating files that were in use.  This flag would, as it sounds, allow you to schedule the move or deletion of a file at a time when it was (pretty much) guaranteed to succeed.

For example:

// This will delete c:\temp\coolcorelibrary.dll on the next reboot
MoveFileEx(“c:\\temp\\coolcorelibrary.dll”, NULL, MOVEFILE_DELAY_UNTIL_REBOOT);

Nowadays, of course, this flag does not work unless you are running in the context of an administrative user.  That’s great, you think, this will still work for my install or upgrade program.

But don’t trust that feeling of security.  Things are never as easy as they seem.  I first realised that this was a problem when researching issues occasionally reported by some of our Keyman Desktop users.

Take this scenario:

  1. A user, Joe, decides to uninstall your awesome app CoolCoreProgram.
  2. The uninstaller finds that a critical file (let’s call it coolcorelibrary.dll) is in use and can’t delete it
  3. Installer calls a MoveFileEx with MOVEFILE_DELAY_UNTIL_REBOOT to schedule deletion of coolcorelibrary.dll.
  4. Would you click Restart Now?  Why not leave it till later?

    The uninstall completes and presents Joe with the dreaded “Hey, you need to restart Windows now” dialog.

  5. Poor unhappy Joe swears and cancels the restart, and continues his work.  He can’t see any good reason to restart Windows…
  6. A short while later, Joe realises that he actually loves CoolCoreProgram and so he downloads and reinstalls the latest, greatest version (now with extra shiny!)
  7. Shortly thereafter, Joe finishes up for the day and turns off his computer.
  8. The next morning, after Joe starts up his computer, Windows notes its instructions from the previous day, and obediently deletes coolcorelibrary.dll.
  9. And now Joe is now really, really unhappy when he tries to start CoolCoreProgram and he gets a bizarre coolcorelibrary.dll missing error.

Who is Joe going to blame?  Is it his fault for not restarting?  Is it yours for using cool APIs such as MoveFileEx?  Or is it a woeful confluence of unintended consequences?

This is probably one of the simplest scenarios in which this problem can crop up.  Things get much worse when you talk about shared libraries, fonts, or other resources which may be in use by the system, or multi-user systems.

Some reasons I have encountered for files in use:

  1. Program is still running (duh!)
  2. System has locked the file (common with fonts, hook libraries)
  3. Antivirus or security software is scanning the file
  4. Another application is trying to update the file (i.e. don’t run multiple installers at once)

One possible fix would be to check and update the registry key
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\PendingFileRenameOperations for listed file before creating or updating them.  But that’s a big performance and reliability hit.  I can’t say that solution sits well with me.

Another idea is to block further install and upgrade tasks until Joe actually does restart his computer, for example with a registry setting or a RunOnce entry.  But that’s going to make Joe hate me even more than he already does!

Some scenarios can be fixed with the Windows Installer automatic repair functionality.  But that assumes access to the original source files is always possible.  So that’s not a general solution.

I don’t really have a good general solution to this problem.  Any ideas?

Indy Sockets: an example of how to not distribute software libraries

Indy Sockets is a library of network components for Embarcadero Delphi, which has been included in the Delphi distribution for many years now.  Like all libraries, bug fixes and patches are regularly made to the source and thus it is typically a good idea to update your the library periodically.
Unfortunately the method of distribution that the Indy Sockets developers have chosen could be a classic example of how to not distribute software libraries.  Here’s why.
When you visit the Indy download page, you are presented with the following blurb:

Installation

Development Snapshot – Instructions to obtain live source and compile manually.

Alternatively you can download the Source Code for Version 10.0.52. This is however a rather old version and is no longer recommended.

So naturally, one follows the Development Snapshot link as the rather old version is no longer recommended, right?  Once you get to the Development Snapshot page, you are then presented with the following scary message:

“You are being warned. This will provide you with a direct link into our current development files. At various times the files may not compile, or in some cases may cause strange errors. Use at your own risk! However please see the version specific notes below. If you are unlucky enough to download a bad version please try again a little later. We apologise for any inconvenience caused.”

This means that when you download Indy, you have no guarantee of getting a stable or even a compiling version.  There is no version information, and you have to rely on their SVN commit logs to figure out what the status of the libraries are.  Oh yes, and the “version specific notes” are missing.

Poor show.

Problems with Internet Explorer 8, print templates and standards compliance

The print engine that we use for one product I work with is based on Internet Explorer’s custom print templates functionality.  This actually works really well, and gives us lots of flexibility and generally is pretty straightforward to use.  Unfortunately, we have run into a bit of a problem recently when trying to print documents in IE8 standards compliance mode: multiple page documents print blank pages, are missing content or sometimes even fail to start printing.

I have been able to reproduce this issue using Microsoft’s own printtemplates.exe example (link in Introduction of article), with only a couple of minor changes to trigger the problem.

Symptoms
The issue arises when all of the following specific conditions are met:

  1. The document to be printed is in IE8 standards compliance mode.  This is mostly easily forced with the X-UA-Compatible META element:
    <META HTTP-EQUIV='X-UA-Compatible' CONTENT='IE=8' />
    
  2. The document to be printed is greater than 1 page long.
  3. The print template uses LAYOUTRECT elements of differing sizes.  This is a common requirement for letters which may have an address section or expanded letterhead at the top of the first page, but will have a much smaller letterhead on subsequent pages.

The Baseline
We’ll use the printtemplates.exe sample provided by Microsoft.  The sample Template3.htm in the example shows how to dynamically create LAYOUTRECT elements.  Here’s what happens before we make any changes to the example:

Print Template sample application, working with defaults
Print Preview, defaults, page 1
Print Preview, defaults, page 2

As shown in the screen shots, the print preview displays as expected.

Reproducing the Problem

With Internet Explorer 8, the problem can be reproduced as follows:

  1. Start printtemplates.exe, and select Template3.htm.
  2. Click the Page Source button.  In the HTML file that is opened, add the META element as the first child of the HEAD element.  This forces the page into IE8 standards compliant mode.  Save the file as sample.htm:
    <HTML>
      <HEAD>
        <META HTTP-EQUIV='X-UA-Compatible' CONTENT='IE=8' />
        <TITLE>Print Template Samples</TITLE>
      </HEAD>
    
  3. Back in printtemplates, click the Template Source button.  In the HTML file that is opened, make the highlighted change to the OnRectComplete function.  This small change means pages other than the first page have a smaller LAYOUTRECT than the initial page.  Save this file as template.htm.
    function OnRectComplete()
    {
        if (event.contentOverflow == true)
        {
            document.all("layoutrect" + (iNextPageToCreate - 1)).onlayoutcomplete = null;
    
            newHTML  = "";
            newHTML += "STYLE='height:5.5in'/>";
            newHTML += "";
    
            pagecontainer.insertAdjacentHTML("beforeEnd", newHTML);
            iNextPageToCreate++;
        }
    }
  4. Enter the full paths of sample.htm and template.htm into the respective fields in the print template sample, then press ENTER in the page field to load the modified sample.htm.

When you press Print Preview now, the print preview window will show blank pages instead of the expected content, and will have only 2 pages instead of the 5 or more that were shown previously:

The print template application with modified page and template ready to roll
Failing print preview, page 1
Failing print preview, page 2

If either of the changes are removed from the sample.htm and template.htm, the problem does not occur.  In Internet Explorer 9, the problem also occurs but appears to be resolved when using IE=9 in the META tag.  However, as Internet Explorer 9 is not available for Windows XP, this is not a viable solution for us.

Workarounds
We’ve found a few workarounds.  None of these are very viable for us but I list them for completeness:

  1. Don’t use LAYOUTRECT elements of differing sizes.  This is a non-starter for us.
  2. Don’t use IE8 standards compliant mode.  This would mean rewriting a number of reports and some complicated CSS to workaround bugs in IE7’s standards compliant mode, but may be a way forward.  Or use quirks mode with all the joy that brings.
  3. Use IE9.

Adding overlay notification icons to Google+’s taskbar icon in IE9

In the same way as I did for Twitter in an earlier post, I have now created a bookmarklet that adds notification overlay icons to Google+’s Taskbar icon, when pinned in IE9.  Currently Google+ only has a small favicon, and I have not found a way to tweak this once the page has loaded, so it’s not quite as pretty as Twitter’s icon, but it’s better than nothing!

Enjoy.

Right click Google+ Overlay Icon (IE9 only!) and click Add to Favorites to create the bookmarklet.

Formatted code for the bookmarklet:

(function() {
    var fLastCount = -1;

    window.setInterval(function()
    {
      try
      {
       var res = document.getElementById('gbi1');
       if(res) {
          var nNewCount = parseInt(res.innerHTML,10);
          if(nNewCount == NaN) nNewCount = 0;
          if(nNewCount != fLastCount) {
            var ico = nNewCount > 5 ? 'error' : 'num_'+nNewCount;
            window.external.msSiteModeSetIconOverlay('http://ie.microsoft.com/testdrive/Browser/tweetfeed/Images/'+ico+'.ico',
              nNewCount+' notifications');
            fLastCount = nNewCount;
          }
        }
        else if(fLastCount != 0) {
          window.external.msSiteModeClearIconOverlay();
          fLastCount = 0;
        }
      } catch(e) { }
    }, 1000);

    })();

Demystifying printing with the Microsoft WebBrowser control and ShowHTMLDialogEx

I’m writing up these notes in order to document what has been a long and painful process, involving much spelunking through MSHTML.DLL and IEFRAME.DLL to try and understand what Internet Explorer (or more accurately, the WebBrowser control) is doing and how to correctly use the semi-documented interfaces to provide full control over a print job.

The original requirement for this mini-project was to provide tray, collation, and duplex control for a HTML print job using IHTMLDocument2.execCommand(IDM_PRINT), with a custom print template.  These functions had been supported through a 3rd party ActiveX component, but this component proved to be incompatible with Internet Explorer 9 (causing a blue screen would you believe!), and the company providing the component was defunct, so it fell to me to re-engineer the solution.

After considerable research, I found some sparse documentation on MSDN suggesting that one could pass a HTMLDLG_PRINT_TEMPLATE flag to ShowHTMLDialogExand thereby duplicate and extend the functionality of the print template.  In particular, the __IE_CMD_Printer_Devmode property that could somehow be passed into this function would give us the ability to control anything we liked in terms of the printer settings.

Too easy.  Much too easy.  The first stumbling block was trying to discover the type of the pvarArgIn parameter to ShowHTMLDialogEx. A variant array seemed sensible but did not work.  It turns out that this needs to be an IHTMLEventObj, which can be created with IHTMLDocument4.CreateEventObject.  You can then use IHTMLEventObj2.setAttribute to set the various attributes for the object.

Then there were questions about what IMoniker magic was needed for the pMk parameter.  And more questions about the most appropriate set of flags.  Diving into the debugger to examine what Microsoft did answered both of these questions — it was a simple CreateURLMonikerEx call, no need to bind the moniker or other magic, and the flags that Microsoft used were HTMLDLG_ALLOW_UNKNOWN_THREAD or HTMLDLG_NOUI or HTMLDLG_MODELESS or HTMLDLG_PRINT_TEMPLATE for a print job, or HTMLDLG_ALLOW_UNKNOWN_THREAD or HTMLDLG_MODAL or HTMLDLG_MODELESS or HTMLDLG_PRINT_TEMPLATE for a print preview job.  Yes, that is both HTMLDLG_MODAL and HTMLDLG_MODELESS!

Next, what variant type should the __IE_BrowseDocument attribute be?  VT_DISPATCH or VT_UNKNOWN?  The answer is VT_UNKNOWN — things just won’t work if you pass a VT_DISPATCH.  I also came unstuck on the __IE_PrinterCmd_DevMode and __IE_PrinterCmd_DevNames attributes.  These need to be a VT_I4 containing an unlocked HGLOBAL that references a DEVMODEW structure.  I’ll leave the setup of the DEVMODEW structure to you: there are a lot of examples of that online.

However, even after overcoming these hurdles (with copious debugging to understand what MSHTML.DLL and IEFRAME.DLL were doing), there were other issues.  First, the print template was unable to access the dialogArguments.__IE_BrowseDocument property, with an Access Denied error thrown.  Also, HTC behaviors would fail to load as the WebBrowser component believed that they were being referenced in an insecure, cross-domain manner.  And finally, JavaScript in the page being printed was failing to run — and this JavaScript was required to render some of the details of the page.

I knew that Microsoft actually pass a reference to a temporary file for printing in the __IE_ContentDocumentURL attribute.  So I saved the file to a temporary file, which also required adding a BASE element to the header so that relative URLs in the document would resolve.  But the problems had not gone away.

All three of these problems in reality stemmed from the same root cause.  The security IDs for the various elements — the print template, the document being printed, and the HTC components — were not matching.  So I embarked on an attempt to find out why.  At first I wondered if we needed to bind the moniker to a bind context or storage.  That was a no-go.  Then I looked at the IInternetSecurityManager interface, which a developer can implement to provide custom security IDs, zones and more.  Sounds logical, right?  Only problem is that the ShowHTMLDialogEx function provides its own IInternetSecurityManager implementation, which you cannot override (and its GetSecurityID just returns INET_E_DEFAULTACTION for the relevant URLs).  Yikes.

I was starting to run out of options.  As far as I could tell, we were duplicating Microsoft’s functionality essentially identically, and I could not see any calls which changed the security for the document so that it would match security contexts.

Finally I noticed an undocumented attribute had been added to the HTML element in the temporary copy of the page: __IE_DisplayURL.  And as soon as I added that to my file, referencing the original URL of the document, everything worked!

Now, this is all fun (and sounds straightforward in hindsight), but without some code it’s probably not terribly helpful.  So here’s some code (in Delphi, translate to your favourite language as required).  It all looks pretty straightforward now(!), but nearly every line involved blood, sweat and tears!  This is really not a complete example and hence does not compile but just covers the bits necessary to complement the better documented aspects of custom printing with MSHTML.  Please note that this example uses the TEmbeddedWB component for Delphi, and that temporary file cleanup has been excluded.

procedure THTMLPrintController.StartPrint(FPrint: Boolean);
var
  FDeviceW, FDriverW, FPortW: WideString;
  FDevModeHandle, FDevNamesHandle: HGLOBAL;
  pEventObj2: IHTMLEventObj2;
    procedure SetTempFileName;
    begin
      FTempFileName := GetTempFileName('', '.htm');
    end;
    { SaveToFile: Saves the current web document to a temporary file, adding the required BASE and HTML properties }
    procedure SaveToFile;
    var
      FElementCollection: IHTMLElementCollection;
      FHTMLElement: IHTMLElement;
      FBaseElement: IHTMLBaseElement;
      FString: WideString;
    begin
      FElementCollection := webBrowser.Doc3.getElementsByTagName('base');
      if FElementCollection.length = 0 then
      begin
        FBaseElement := webBrowser.Doc2.createElement('base') as IHTMLBaseElement;
        FBaseElement.href := webBrowser.LocationURL;
        (webBrowser.Doc3.getElementsByTagName('head').item(0,0) as IHTMLElement2).insertAdjacentElement('afterBegin', FBaseElement as IHTMLElement);
      end
      else
      begin
        FBaseElement := FElementCollection.item(0,0) as IHTMLBaseElement;
        if FBaseElement.href = '' then FBaseElement.href := webBrowser.LocationURL;
      end;
      FElementCollection := webBrowser.Doc3.getElementsByTagName('html');
      if FElementCollection.length > 0 then
      begin
        FHTMLElement := FElementCollection.item(0,0) as IHTMLElement;
        FHTMLElement.setAttribute( '__IE_DisplayURL', webBrowser.LocationURL, 0);
      end;
      with TFileStream.Create(FTempFileName, fmCreate) do
      try
        if webBrowser.Doc5.compatMode = 'CSS1Compat' then
        begin
          FString := '';
          Write(PWideChar(FString)^, Length(FString)*2);
        end;
        FString := webBrowser.Doc3.documentElement.outerHTML;
        Write(PWideChar(FString)^, Length(FString)*2);
      finally
        Free;
      end;
    end;
    { Configured the printer, assuming we've already been passed an ANSI handle }
    procedure ConfigurePrinter;
    var
      FDevice, FDriver, FPort: array[0..255] of char;
      FDevModeHandle_Ansi: HGLOBAL;
      FPrinterHandle: THandle;
      FDevMode: PDeviceModeW;
      FDevNames: PDevNames;
      FSucceeded: Boolean;
      sz: Integer;
      Offset: PChar;
    begin
      Printer.GetPrinter(FDevice, FDriver, FPort, FDevModeHandle_Ansi);
      if FDevModeHandle_Ansi = 0 then
        RaiseLastOSError;
      FDeviceW := FDevice;
      FDriverW := FDriver;
      FPortW := FPort;
      { Setup the DEVMODE structure }
      FSucceeded := False;
      if not OpenPrinterW(PWideChar(FDeviceW), FPrinterHandle, nil) then
        RaiseLastOSError;
      try
        sz := DocumentPropertiesW(0, FPrinterHandle, PWideChar(FDeviceW), nil, nil, 0);
        if sz < 0 then RaiseLastOSError;
        FDevModeHandle := GlobalAlloc(GHND, sz);
        if FDevModeHandle = 0 then RaiseLastOSError;
        try
          FDevMode := GlobalLock(FDevModeHandle);
          if FDevMode = nil then
            RaiseLastOSError;
          try
            if DocumentPropertiesW(0, FPrinterHandle, PWidechar(FDeviceW), FDevMode, nil, DM_OUT_BUFFER) < 0 then
              RaiseLastOSError;
            FDevMode.dmFields := FDevMode.dmFields or DM_DEFAULTSOURCE or DM_DUPLEX or DM_COLLATE;
            FDevMode.dmDefaultSource := FTrayNumber;
            if FDuplex
              then FDevMode.dmDuplex := DMDUP_VERTICAL
              else FDevMode.dmDuplex := DMDUP_SIMPLEX;
            if FCollate
              then FDevMode.dmCollate := DMCOLLATE_TRUE
              else FDevMode.dmCollate := DMCOLLATE_FALSE;
            if DocumentPropertiesW(0, FPrinterHandle, PWideChar(FDeviceW), FDevMode, FDevMode, DM_OUT_BUFFER or DM_IN_BUFFER) < 0 then
              RaiseLastOSError;
            FSucceeded := True;
          finally
            GlobalUnlock(FDevModeHandle);
          end;
        finally
          if not FSucceeded then GlobalFree(FDevModeHandle);
        end;
      finally
        ClosePrinter(FPrinterHandle);
      end;
      Assert(FSucceeded);
      { Setup up the DEVNAMES structure }
      FSucceeded := False;
      FDevNamesHandle := GlobalAlloc(GHND, SizeOf(TDevNames) +
       (Length(FDeviceW) + Length(FDriverW) + Length(FPortW) + 3) * 2);
      if FDevNamesHandle = 0 then RaiseLastOSError;
      try
        FDevNames := PDevNames(GlobalLock(FDevNamesHandle));
        if FDevNames = nil then RaiseLastOSError;
        try
          Offset := PChar(FDevNames) + SizeOf(TDevnames);
          with FDevNames^ do
          begin
            wDriverOffset := (Longint(Offset) - Longint(FDevNames)) div 2;
            Move(PWideChar(FDriverW)^, Offset^, Length(FDriverW) * 2 + 2);
            Inc(Offset, Length(FDriverW) * 2 + 2);
            wDeviceOffset := (Longint(Offset) - Longint(FDevNames)) div 2;
            Move(PWideChar(FDeviceW)^, Offset^, Length(FDeviceW) * 2 + 2);
            Inc(Offset, Length(FDeviceW) * 2 + 2);
            wOutputOffset := (Longint(Offset) - Longint(FDevNames)) div 2;
            Move(PWideChar(FPortW)^, Offset^, Length(FPortW) * 2 + 2);
          end;
          FSucceeded := True;
        finally
          GlobalUnlock(FDevNamesHandle);
        end;
      finally
        if not FSucceeded then GlobalFree(FDevNamesHandle);
      end;
      Assert(FSucceeded);
    end;
  { Creates the IHTMLEventObj2 and populates the attributes for printing }
  procedure CreateEventObject;
  var
    v: OleVariant;
    FShortFileName: WideString;
    FShortFileNameBuf: array[0..260] of widechar;
  begin
    v := EmptyParam;
    pEventObj2 := webBrowser.Doc4.CreateEventObject(v) as IHTMLEventObj2;
    pEventObj2.setAttribute('__IE_BaseLineScale', 2, 0);
    GetShortPathNameW(PWideChar(FTempFileName), FShortFileNameBuf, 260); FShortFileName := FShortFileNameBuf;

    v := webBrowser.Document as IUnknown;
    pEventObj2.setAttribute('__IE_BrowseDocument', v, 0);
    pEventObj2.setAttribute('__IE_ContentDocumentUrl', FShortFileName, 0);
    pEventObj2.setAttribute('__IE_ContentSelectionUrl', '', 0);  // Empty as we never print selections
    pEventObj2.setAttribute('__IE_FooterString', '', 0);
    pEventObj2.setAttribute('__IE_HeaderString', '', 0);
    pEventObj2.setAttribute('__IE_ActiveFrame', 0, 0);
    pEventObj2.setAttribute('__IE_OutlookHeader', '', 0);
    pEventObj2.setAttribute('__IE_PrinterCMD_Device', FDeviceW, 0);
    pEventObj2.setAttribute('__IE_PrinterCMD_Port', FPortW, 0);
    pEventObj2.setAttribute('__IE_PrinterCMD_Printer', FDriverW, 0);
    pEventObj2.setAttribute('__IE_PrinterCmd_DevMode', FDevModeHandle, 0);
    pEventObj2.setAttribute('__IE_PrinterCmd_DevNames', FDevNamesHandle, 0);
    if FPrint
      then pEventObj2.setAttribute('__IE_PrintType', 'NoPrompt', 0)
      else pEventObj2.setAttribute('__IE_PrintType', 'Preview', 0);
    pEventObj2.setAttribute('__IE_TemplateUrl', GetPrintTemplateURL, 0);
    pEventObj2.setAttribute('__IE_uPrintFlags', 0, 0);
    v := VarArrayOf([FShortFileName]);
    pEventObj2.setAttribute('__IE_TemporaryFiles', v, 0);
    pEventObj2.setAttribute('__IE_ParentHWND', 0, 0);
    pEventObj2.setAttribute('__IE_HeaderString', webBrowser.Doc2.title, 0);
    pEventObj2.setAttribute('__IE_DisplayURL', webBrowser.LocationURL, 0);
  end;
  procedure InstantiateDialog;
  var
    FWindowParams, FMonikerURL: WideString;
    FMoniker: IMoniker;
    FDialogFlags: DWord;
    varArgIn, varArgOut: OleVariant;
    res: HRESULT;
  begin
    varArgIn := pEventObj2 as IUnknown;
    varArgOut := Null;
    FMonikerURL := GetPrintTemplateURL;
    OleCheck(CreateURLMonikerEx(nil, PWideChar(FMonikerURL), FMoniker, URL_MK_UNIFORM));
    if FPrint then
    begin
      FWindowParams := '';
      FDialogFlags := HTMLDLG_ALLOW_UNKNOWN_THREAD or HTMLDLG_NOUI or HTMLDLG_MODELESS or HTMLDLG_PRINT_TEMPLATE;
    end
    else
    begin
      FWindowParams := 'resizable=yes;';
      FDialogFlags := HTMLDLG_ALLOW_UNKNOWN_THREAD or HTMLDLG_MODAL or HTMLDLG_MODELESS or HTMLDLG_PRINT_TEMPLATE;
    end;
    res := ShowHTMLDialogEx(0, FMoniker, FDialogFlags, varArgIn, PWideChar(FWindowParams), varArgOut);
    if res <> S_OK then raise EOSError.Create(SysErrorMessage(res));
  end;
begin
  SetTempFileName;
  SaveToFile;
  ConfigurePrinter;
  CreateEventObject;
  InstantiateDialog;
end;

Update 14 July: This code is not our production code: I’ve stripped out bits and pieces and tried to keep the bits that are somewhat relevant. Don’t worry too much about the ConfigurePrinter details — the takeaway is the HGLOBAL. I must also apologise for the atrocity that is the SaveToFile function. That’s what you get when working with legacy versions of software. Internet Explorer also won’t reliably work with non-ASCII content there unless you toss a BOM into the start of the stream.

Fixing Windows font scaling without restarting

Windows 7 and Windows Server 2008 include the ability for each user to set their font scale. This is fantastic, except for a legacy complication: the old bitmap fonts MS Sans Serif, MS Serif and Courier have specific versions for each font scale, but these are never changed after Windows is installed. In previous versions of Windows, the fonts were replaced with the correct versions for the selected font scale, which is why a system restart was required

This means that these bitmap fonts can be out of sync with the currently selected font scale. This is typically only a problem for legacy applications, but it is ugly in those cases!

More background is available at the MSDN blog http://blogs.msdn.com/b/developingfordynamicsgp/archive/2009/11/25/windows-7-bitmap-fonts-and-microsoft-dynamics-gp.aspx and the follow-up post http://blogs.msdn.com/b/developingfordynamicsgp/archive/2009/12/02/more-on-windows-7-bitmap-fonts-and-dpi-settings.aspx

In our situation, it was even worse: the client was running a Remote Desktop Services environment, where restarting the server was really out of the question.

So I wrote a little fix-it app that dynamically adjusts all the font scaling registry settings and installs the correct fonts for the selected font scale.  You may need to log off and log on again, but in most cases, no restart is required.  It is setup for 100% and 125% only, and I provide this app here only as a useful tool.  No support or warranties, etc, etc.  Use at your own risk!

Download Fontsizefix.zip.

Update 1 Jul: As I discussed this blog with Peter Constable, I realised that I didn’t really describe what the tool did.  So: fontsizefix updates the various metrics in HKCU\Control Panel\Desktop, and a couple of LogPixels registry settings in HKLM\SYSTEM\CurrentControlSet\Hardware Profiles\CurrentSoftware\Fonts and HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\FontDPI\LogPixels, updates the fonts key in the registry to point to the correct versions of MS Sans Serif, MS Serif and Courier, and then RemoveFontResource and AddFontResource in order to get the correct version of the font loaded.  I’m sure it’s not 100% but it got us over a hurdle with the terminal services environment.  For purposes of support, it was easiest to make a tool that did the whole lot rather than document a bunch of registry tweaks which are easy to trip over on, and then we figured we might as well make it available to other users as well…

UPDATE STATISTICS and hints in SQL Server

I was working today on a slow query in SQL Server — it was a simple query on a well-indexed table, and I
could not initially see why it would be so slow. So I did some tests, and was surprised by the results.

I have reproduced the situation I was working through with a test table, with the following structure:

CREATE TABLE HintTest (
 HintTestID INT NOT NULL IDENTITY(1,1),
 Col1 INT NULL,
 Col2 INT NULL,
 Col3 INT NULL,
 CONSTRAINT PK_HintTest PRIMARY KEY (HintTestID)
);

CREATE NONCLUSTERED INDEX HintTest_Col1 ON HintTest (Col1)
-- column 2 was not indexed
CREATE NONCLUSTERED INDEX HintTest_Col3 ON HintTest (Col3)

Then I copied a set of records from the original data I was working with. The following code will
not reproduce this (and you’ll see the reason later) but does very roughly mimic the distribution of the data:

I used a quick and dirty test harness:

DECLARE @v INT, @StartTime DATETIME
SET @StartTime = GETDATE()
SET @v = 0

DECLARE @prmRecordFound BIT, @prmCol1 INT, @prmCol2 INT, @prmCol3 INT

SET @prmCol1 = 50
SET @prmCol2 = 3
SET @prmCol3 = 1750

WHILE @v &lt; 10000
BEGIN
 SET @v = @v + 1

 -- Test Case Here
END

PRINT DATEDIFF(ms, @StartTime, GETDATE())

And here are the tests I wrote:

  -------------------------------------------
  -- TEST CASE 1: SELECT primary key and separate IF
  -------------------------------------------

  SET @prmRecordFound = 0

  DECLARE @HintTestID INT

  SELECT TOP 1 @HintTestID = HintTestID
  FROM
    HintTest
  WHERE
    Col1 = @prmCol1 AND
    Col2 = @prmCol2 AND
    Col3 = @prmCol3

  IF @HintTestID IS NOT NULL
    SET @prmRecordFound = 1

  -------------------------------------------
  -- TEST CASE 2: SELECT COUNT and separate IF
  -------------------------------------------

  SET @prmRecordFound = 0

  DECLARE @Count INT

  SELECT @Count = COUNT(*)
  FROM
    HintTest
  WHERE
    Col1 = @prmCol1 AND
    Col2 = @prmCol2 AND
    Col3 = @prmCol3

  IF @Count &gt; 0
    SET @prmRecordFound = 1

  -------------------------------------------
  -- TEST CASE 3: SELECT COUNT nested in IF
  -------------------------------------------

  SET @prmRecordFound = 0

  IF (SELECT COUNT(*)
  FROM
    HintTest
  WHERE
    Col1 = @prmCol1 AND
    Col2 = @prmCol2 AND
    Col3 = @prmCol3) &gt; 0
    SET @prmRecordFound = 1

  -------------------------------------------
  -- TEST CASE 4: SELECT COUNT with hint nest in IF
  -------------------------------------------

  SET @prmRecordFound = 0

  IF (SELECT COUNT(*)
  FROM
    HintTest WITH(INDEX(HintTest_Col1, HintTest_Col23))
  WHERE
    Col1 = @prmCol1 AND
    Col2 = @prmCol2 AND
    Col3 = @prmCol3) &gt; 0
    SET @prmRecordFound = 1

  -------------------------------------------
  -- TEST CASE 5: EXISTS SELECT * in IF
  -------------------------------------------

  SET @prmRecordFound = 0

  DECLARE @Count INT

  IF EXISTS(SELECT *
  FROM
    HintTest
  WHERE
    Col1 = @prmCol1 AND
    Col2 = @prmCol2 AND
    Col3 = @prmCol3)
    SET @prmRecordFound = 1

  -------------------------------------------
  -- TEST CASE 6: EXISTS SELECT * with hint in IF
  -------------------------------------------

  SET @prmRecordFound = 0

  DECLARE @Count INT

  IF EXISTS(SELECT *
  FROM
    HintTest WITH(INDEX(HintTest_Col1, HintTest_Col23))
  WHERE
    Col1 = @prmCol1 AND
    Col2 = @prmCol2 AND
    Col3 = @prmCol3)
    SET @prmRecordFound = 1

The first run results reproduced the situation quite well, and returned the following surprising statistics (10,000 iterations):

1 (SELECT primary key and separate IF)  846 ms
2 (SELECT COUNT and separate IF)        203 ms
3 (SELECT COUNT nested in IF)           3523 ms
4 (SELECT COUNT with hint nested in IF) 226 ms
5 (EXISTS SELECT * in IF)               3460 ms
6 (EXISTS SELECT * with hint in IF)     263 ms

I was puzzled why there would be such a difference between cases 2 and 3, given that they were so similar,
so I looked at the execution plan for the query. It turns out that the query optimizer was selecting a very
non-optimal plan (a clustered index scan) when the SELECT statement was nested in an IF statement, whereas for
case 2 it was using two index seeks followed by a merge join. Adding hints overrode the query optimizer, and
the results can be seen in cases 4 and 6.

I ran the following statement to refresh the data the query optimizer uses:

UPDATE STATISTICS HintTest

UPDATE STATISTICS will update information about the distribution of keys for indexes on the table, which is then used by the query optimizer. Then I re-ran the test (100,000 iterations shown below). Dramatic difference. Now the hinted queries were among the slowest:

1 (SELECT primary key and separate IF)  2266 ms
2 (SELECT COUNT and separate IF)        2313 ms
3 (SELECT COUNT nested in IF)           2500 ms
4 (SELECT COUNT with hint nested in IF) 2546 ms
5 (EXISTS SELECT * in IF)               2656 ms
6 (EXISTS SELECT * with hint in IF)     2706 ms

For me, the big lesson learned out of this was this:

Always run UPDATE STATISTICS before trying to optimize a query!

The second thing I learned was that the fastest query is often unexpected. I would have expected the
EXISTS condition (case 5) to be optimal for a simple boolean result but instead the fastest query
was consistently case 1 – the SELECT primary key method.