Category Archives: Computing

Strava segment statistics site updates

I made some small tweaks to my Strava segment stats website recently. This includes:

  • Fixes for rendering issues in the lists
  • Ability to hide bogus segments, similar to the Strava flagging function
  • Dynamically sortable tables
  • Underpinnings for anyone to be able to have their stats updated automatically (but UI for this not yet complete)

Currently I also have an issue with the Strava API, where segments with more than 50 efforts cannot have all efforts populated. Once I have a resolution for this, I’ll publish another update with the ability for anyone to view their stats.

Loading a Unicode string from a file with Delphi functions

In my previous post, I described differences in saving text with TStringStream and TStringList.  TStringList helpfully adds a preamble.  TStringStream doesn’t.  Now when loading text from a stream, you’ll typically want to strip off the preamble.  But if you want the text to be otherwise unmodified, then TStringList is not safe, and TStringStream doesn’t strip off the preamble.

Here’s a helper function that does strip the preamble.  If you don’t pass an encoding, it will guess on the basis of the preamble (but won’t otherwise sniff the stream content to guess the encoding heuristically). If the content does not have a preamble, it assumes the current code page (TEncoding.Default).  If you do pass an encoding, the preamble will be stripped if it is there but no encoding detection will take place.

function LoadStringFromFile(const filename: string; encoding: TEncoding = nil): string;
var
  FPreambleLength: Integer;
begin
  with TBytesStream.Create do
  try
    LoadFromFile(filename);
    FPreambleLength := TEncoding.GetBufferEncoding(Bytes, encoding);
    Result := encoding.GetString(Bytes, FPreambleLength, Size - FPreambleLength);
  finally
    Free;
  end;
end;

Obviously not fantastic for very large files (you can solve that one yourself) but for your plain old bite sized files, quick and easy solution.

Comparing TStringStream vs TStringList for writing Unicode strings to streams

There are two methods widely used in Delphi code for reading and writing strings to/from streams with Delphi, that initially seem pretty similar in their behaviour.  These are TStrings.SaveToStream and TStringStream.SaveToStream (or SaveToFile in either case):

procedure TForm1.SaveToFile;
const
  AString = 'This is some Unicode text'#13+
            'Test Unicode © Δ א';
begin
  with TStringList.Create do
  try
    Text := AString;
    SaveToFile('TStringList UTF8.txt', TEncoding.UTF8);
  finally
    Free;
  end;

  with TStringStream.Create(AString, TEncoding.UTF8) do
  try
    SaveToFile('TStringStream UTF8.txt');
  finally
    Free;
  end;
end;

But there are several crucial differences in what is written to the stream between these two methods:

  1. TStringList prepends the preamble bytes for the encoding (in this case, #$EF#$BB#$BF)
  2. TStringList appends a new line #$0D#$0A to the file, if your text does not already end in a new line.
  3. TStringList converts any single line breaking characters in the text (e.g. #$0D or #$0A) into #$0D#$0A.

The following hex dumps may show this more clearly:

EF BB BF 54 68 69 73 20 69 73 20 73 6F 6D 65 20
55 6E 69 63 6F 64 65 20 74 65 78 74 0D 0A 54 65
73 74 20 55 6E 69 63 6F 64 65 20 C2 A9 20 CE 94
20 D7 90 0D 0A 

TStringList UTF8.txt

54 68 69 73 20 69 73 20 73 6F 6D 65 20 55 6E 69
63 6F 64 65 20 74 65 78 74 0D 54 65 73 74 20 55
6E 69 63 6F 64 65 20 C2 A9 20 CE 94 20 D7 90

TStringStream UTF8.txt

Make sure you know how your files will be read and whether these differences are important to the target application.

Basically, TStringList is typically not appropriate for streaming strings without modification.  TStringStream is your friend here.  But if you need the preamble, and just the preamble, then you’ll have to do a little more work; you won’t be able to use TStringStream.SaveToFile.

IXMLDocument.SaveToStream does not always use UTF-16 encoding

Delphi’s documentation on IXMLDocument.SaveToStream has the following important caveat:

Regardless of the encoding system of the original XML document, SaveToStream always saves the stream in UTF-16.

It’s helpful to have notes like this.  Mind you, forcing UTF-16 output is definitely horrible; what if we need our document in UTF-8 or (God forbid) some non-Unicode encoding?

Now Kris and I were looking at a Unicode corruption issue with an XML document in a Delphi application and struggling to understand what was going wrong given this statement in the documentation.  Our results didn’t add up, so we wrote a little test app to test that statement:

procedure TForm1.OutputEncodingIsUTF8;
const
  UTF8XMLDoc: string =
    '<!--?xml version="1.0" encoding="utf-8"?-->'#13#10+
    '';
var
  XMLDocument: IXMLDocument;
  InStream: TStringStream;
  OutStream: TFileStream;
begin
  // stream format is UTF8, input string is converted to UTF8
  // and saved to the stream
  InStream := TStringStream.Create(UTF8XMLDoc, TEncoding.UTF8);</blockquote>
  // we'll write to this output file
  OutStream := TFileStream.Create('file_should_be_utf16_but_is_utf8.xml',
    fmCreate);
  try
    XMLDocument := TXMLDocument.Create(nil);
    XMLDocument.LoadFromStream(InStream);
    XMLDocument.SaveToStream(OutStream);
    // IXMLDocument.SaveToStream docs state will always be UTF-16
  finally
    FreeAndNil(InStream);
    FreeAndNil(OutStream);
  end;

  with TStringList.Create do
  try    // we want to load it as a UTF16 doc given the documentation
    LoadFromFile('file_should_be_utf16_but_is_utf8.xml', TEncoding.Unicode);
    ShowMessage('This should be displayed as an XML document '+
                'but instead is corrupted: '+#13#10+Text);
  finally
    Free;
  end;
end;

When I run this, I’m expecting the following dialog:

But instead I get the following dialog:

Note, this is run on Delphi 2010.  Haven’t run this test on Delphi XE2, but the documentation hasn’t changed.

The moral of the story is, the output encoding is the same as the input encoding, unless you change the output encoding with the Encoding property, for example, adding the highlighted line below fixes the code sample:

    XMLDocument := TXMLDocument.Create(nil);
    XMLDocument.LoadFromStream(InStream);
    XMLDocument.Encoding := 'UTF-16';
    XMLDocument.SaveToStream(OutStream);

The same documentation issue exists for TXMLDocument.SaveToStream.  I’ve reported the issue in QualityCentral.

The case of the terribly slow PayPal emails

Every now and then I receive a payment via PayPal.  That’s not unusual, right?  PayPal would send me an email notifying me of the payment, and I’d open up Outlook to take a look at it.  All well and good.  In the last week, however, something changed.  When I clicked on any PayPal email, Outlook would take the best part of a minute to open the email, and what’s more, would freeze its user interface entirely while doing this.  But this was only happening with emails from PayPal — everything else was fine.

Not good.  At first I suspected an addin was mucking things up, so I disabled all the Outlook addins and restarted Outlook for good measure.  No difference.  Now I was getting worried — what if this was some unintended side-effect of some malware that had somehow got onto my machine, and it was targeting PayPal emails?

So I decided to do some research.  I fired up SysInternals’ Process Monitor, set it up to show only Outlook in its filtering, and turned on the Process Monitor trace.

Process Monitor filter window – filtering for OUTLOOK.EXE

Then I went and clicked on a PayPal email.  Waited the requisite time for the email to display, then went back to Process Monitor and turned off the trace.  I added the Duration column to make it easier to spot an anomalous entry.  This doesn’t always help but given the long delay, I was expecting some file or network activity to be taking a long time to run.

Adding the Duration column

Then scrolling up the log I quickly spotted the following entry.  It had a duration of nearly 3 seconds which stood out like a sore thumb.

The first offending entry

This entry was a Windows Networking connection to connect to a share on the remote host \\102.112.2o7.net.  This came back, nearly 3 seconds later, with ACCESS DENIED.  Then there were a bunch of follow up entries that related to this, all in all taking over 30 seconds to complete.  A quick web search revealed that this domain with its dubious looking name 102.112.2o7.net belongs to a well known web statistics company called Omniture.  That took some of the load off my mind, but now I was wondering how on earth opening a PayPal email could result in Internet access when I didn’t have automatic downloads of pictures switched on.

One of the emails that caused the problem, redacted of course 🙂

I opened the source of the PayPal HTML email and searched for “102.112“.  And there it was.

The HTML email in notepad

That’s a classic web bug.  Retrieving that image of a 1×1 pixel size, no doubt transparent, with some details encoded in the URL to record my visit to the web page (or in this case, opening of the email):

What was curious about this web bug was the use of the “//” shorthand to imply the same protocol (e.g. HTTP or HTTPS) as the base page.  That’s all well and good in a web browser, but in Outlook, the email is not being retrieved over HTTP.  So Outlook interprets this as a Windows Networking address, and attempts a Windows Networking connection to the host instead, \\102.112.2o7.net\b….

At this point I realised this could be viewed as a security flaw in Outlook.  So I wrote to Microsoft instead of publishing this post immediately.  Microsoft responded that they did not view this as a vulnerability (as it does not result in system compromise), but that they’d pass it on to the Outlook team as a bug.

Nevertheless, this definitely has the potential of being exploited for tracking purposes.  One important reason that external images are not loaded by default is to prevent third parties from being notified that you are reading a particular email.  While this little issue does not actually cause the image to load (it is still classified as an external image), it does cause a network connection to the third party server which could easily be logged for tracking purposes.  This network connection should not be happening.

So what was my fix?  Well, I don’t really care about Omniture or whether PayPal get their statistics, so I added an entry in my hosts file to block this domain.  Using an invalid IP address made it fail faster than the traditional use of 127.0.0.1:

0.0.0.0    102.112.2o7.net

And now my PayPal emails open quickly again.

Notes on MSHTML editing and empty elements

We ran into a problem recently with the MSHTML editor where empty paragraphs would collapse when the user saved or printed the document.  If  the user loaded the document again, the empty elements seem to disappear entirely.  Logically, this makes sense: an empty element has 0x0 dimensions, so will take up no space.  But if the user adds a blank line, they would not expect it to disappear after saving: MSHTML gives these new empty elements dimensions as if they contained a non breaking space, until the document is saved and reloaded.

Let’s look at that visually.  We’d type ONETWOTHREE into the editor, and all would be fine, as shown on the left.  After reloading, it would display as shown on the right:

Source HTML editor and collapsing HTML result

So what is going on, and what’s the solution?

The basic solution is to add an   entity (non-breaking space) to empty elements to give them a non-zero width and height.  MSHTML does this.  But there’s a heap of complexity around this.  From my analysis of the problem, it seems that the core issue is that element.innerHTML or domnode.childNodes.length is returning “” or 0 respectively when viewing the full source shows that there is actually an   in the element.  This happens only when the MSHTML editor is active.

The complexity arises when one looks at the different edit modes and methods of loading documents, because each mode has slightly different symptoms.  Whilst juggling the tangle of symptoms that these issues present, we also need to consider the following requirements:

  1. A document may contain both empty

    elements and

     

    elements, and the editor must not conflate the two when loading.

  2. When the user inserts a blank line, it must not collapse.  Nevertheless, the user does not want to learn about non-breaking spaces, so the editor must transparently manage this. Ideally, the non-breaking space would be hidden from the user but managed in the back end.  I must say that MSHTML is very close to this ideal.

A diversion: this is not a new problem, and many solutions have been proposed. One solution suggested in various forums online is to use
to break lines instead of the paragraph model with

or

.  This is not a great answer: it means the whole document has a single paragraph style.  I do note however that this is what Blogger and some other blog editors do, but then they do dynamically insert a

when the user changes the paragraph style.  Still not pretty.

Now where I talk about

elements, feel free to imagine that I am talking about

elements.  The behaviour appears to be much the same, and there’s just a flag to switch between the two elements. Also, I’ll be using Delphi for the code samples because it hides a lot of the necessary COM guff and makes the examples much easier to read.

When a new blank line is inserted into the editor, behind the scenes MSHTML will add an   entity to prevent the element from collapsing.  When you type the first letter, the   is deleted.  MSHTML also makes the   itself invisible to the user.  This is great.  It’s exactly what we want.

So let’s look at the activation of the editor and what is happening there. It turns out that there are three ways of making a document editable — four if you include the undocumented IDM_EDITMODE command that some editor component wrappers use.  So what are these four methods?

  1. Set document.designMode to “On”.

      D := WebBrowser.Document as IHTMLDocument2;
      D.designMode := ‘On’;
     

  2. Set document.body.contentEditable to “true” (or anyElement.contentEditable).

      D := WebBrowser.Document as IHTMLDocument2;
      (D.body as IHTMLElement3).contentEditable := ‘true’;
     

  3. The DISPID_AMBIENT_USERMODE ambient property.  See the link for an example.
     
  4. The aforementioned IDM_EDITMODE command ID.  I’m not condoning this method, just documenting it because some editor wrappers use it.

      D := WebBrowser.Document as IHTMLDocument2;
      (D as IOleWindow).GetWindow(hwnd_);

      SendMessage(hwnd_, WM_COMMAND, IDM_EDITMODE, 0);

To make things even more complicated, there are different ways of loading content into the HTML editor, and different methods have different outcomes.  The three methods we explored were using Navigate, document.write, and IPersistFile.

  1. Using editor.Navigate to load either a local or remote document.

      WebBrowser.Navigate(DocFileName);
     

  2. Using document.write to write a complete document.

      D := WebBrowser.Document as IHTMLDocument2;
      VarArray := VarArrayCreate([0, 0], varVariant);
      VarArray[0] := DocText;
      D.write(PSafeArray(TVarData(VarArray).VArray));
      D.close;
     

  3. Accessing the editor’s IPersistFile interface to load a document.

      D := WebBrowser.Document;
      PersistFile := D as IPersistFile;
      PersistFile.Load(PWideChar(DocFileName), 0);

It turns out that if you use either Navigate or contentEditable, then MSHTML will not hide   from the end user for elements already in the document.  New empty elements typed by the user will still have the default behaviour described previously.  This is inconsistent and confusing to both me (the poor developer) and the end user.

The following table shows how   are treated in otherwise empty elements when loaded in the various ways:

Navigate document.write IPersistFile
designMode
visible
invisible
invisible
contentEditable
visible
visible
visible
IDM_EDITMODE
visible
invisible
invisible
DISPID_AMBIENT_USERMODE
not tested

Now, it turns out that the visible results on that matrix are not going to do what we are looking for.  That’s because, as mentioned, any new empty elements typed by the user will still have the invisible behaviour for that empty element  .

Here’s a final example to clarify the situation.  Given the following document loaded into the MSHTML editor using document.write, and designMode = “On”.

 

 

With this result, we get the following results.

Code Result
doc.getElementById(“a”).innerHTML
doc.getElementById(“a”).outerHTML
doc.getElementById(“a”).parentElement.innerHTML
 
doc.getElementById(“a”).parentElement.outerHTML
 
doc.documentElement.outerHTML


 

So. How do we determine if an element is empty and collapsed, or is a blank line?  MSHTML isn’t consistent with its innerHTML, outerHTML or DOM text node properties of the element.

But wait, all hope is not lost!  It turns out that IHTMLElement3 has a little buried property called inflateBlock.  This property tells you whether or not an empty element will be ‘inflated’ to appear as though it has content.  This little known property (I found no discussions or blogs about it!) should solve our problem neatly:

isElementTrulyEmpty := (element.innerHTML = ”) and not (element as IHTMLElement3).inflateBlock;

isElementJustABlankLine := (element.innerHTML = ‘ ‘) or ((element.innerHTML = ”) and (element as IHTMLElement3).inflateBlock);

Now I just have to push this fix into the HTML editor component wrapper we are using.  At least I’ve already written the documentation around the fix!

Final note: even the Blogger editor that I’m using to write this post has trouble with consistency with new lines.  Here’s an example — look at the spacing around the code samples.

A screenshot of this blog post, in the editor (Firefox)

A screenshot of this blog post, previewing (Firefox)

Why you should not use MoveFileEx with MOVEFILE_DELAY_UNTIL_REBOOT

A common problem that you, as a developer, may run into on Windows is the need to replace a file that is in use. This commonly happens with installations and upgrades, but can of course also happen in general use.

In earlier versions of Windows, when most users worked in full administrator mode, the MoveFileEx function with the MOVEFILE_DELAY_UNTIL_REBOOT flag was suggested by Microsoft as a great approach for updating files that were in use.  This flag would, as it sounds, allow you to schedule the move or deletion of a file at a time when it was (pretty much) guaranteed to succeed.

For example:

// This will delete c:\temp\coolcorelibrary.dll on the next reboot
MoveFileEx(“c:\\temp\\coolcorelibrary.dll”, NULL, MOVEFILE_DELAY_UNTIL_REBOOT);

Nowadays, of course, this flag does not work unless you are running in the context of an administrative user.  That’s great, you think, this will still work for my install or upgrade program.

But don’t trust that feeling of security.  Things are never as easy as they seem.  I first realised that this was a problem when researching issues occasionally reported by some of our Keyman Desktop users.

Take this scenario:

  1. A user, Joe, decides to uninstall your awesome app CoolCoreProgram.
  2. The uninstaller finds that a critical file (let’s call it coolcorelibrary.dll) is in use and can’t delete it
  3. Installer calls a MoveFileEx with MOVEFILE_DELAY_UNTIL_REBOOT to schedule deletion of coolcorelibrary.dll.
  4. Would you click Restart Now?  Why not leave it till later?

    The uninstall completes and presents Joe with the dreaded “Hey, you need to restart Windows now” dialog.

  5. Poor unhappy Joe swears and cancels the restart, and continues his work.  He can’t see any good reason to restart Windows…
  6. A short while later, Joe realises that he actually loves CoolCoreProgram and so he downloads and reinstalls the latest, greatest version (now with extra shiny!)
  7. Shortly thereafter, Joe finishes up for the day and turns off his computer.
  8. The next morning, after Joe starts up his computer, Windows notes its instructions from the previous day, and obediently deletes coolcorelibrary.dll.
  9. And now Joe is now really, really unhappy when he tries to start CoolCoreProgram and he gets a bizarre coolcorelibrary.dll missing error.

Who is Joe going to blame?  Is it his fault for not restarting?  Is it yours for using cool APIs such as MoveFileEx?  Or is it a woeful confluence of unintended consequences?

This is probably one of the simplest scenarios in which this problem can crop up.  Things get much worse when you talk about shared libraries, fonts, or other resources which may be in use by the system, or multi-user systems.

Some reasons I have encountered for files in use:

  1. Program is still running (duh!)
  2. System has locked the file (common with fonts, hook libraries)
  3. Antivirus or security software is scanning the file
  4. Another application is trying to update the file (i.e. don’t run multiple installers at once)

One possible fix would be to check and update the registry key
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\PendingFileRenameOperations for listed file before creating or updating them.  But that’s a big performance and reliability hit.  I can’t say that solution sits well with me.

Another idea is to block further install and upgrade tasks until Joe actually does restart his computer, for example with a registry setting or a RunOnce entry.  But that’s going to make Joe hate me even more than he already does!

Some scenarios can be fixed with the Windows Installer automatic repair functionality.  But that assumes access to the original source files is always possible.  So that’s not a general solution.

I don’t really have a good general solution to this problem.  Any ideas?

Indy Sockets: an example of how to not distribute software libraries

Indy Sockets is a library of network components for Embarcadero Delphi, which has been included in the Delphi distribution for many years now.  Like all libraries, bug fixes and patches are regularly made to the source and thus it is typically a good idea to update your the library periodically.
Unfortunately the method of distribution that the Indy Sockets developers have chosen could be a classic example of how to not distribute software libraries.  Here’s why.
When you visit the Indy download page, you are presented with the following blurb:

Installation

Development Snapshot – Instructions to obtain live source and compile manually.

Alternatively you can download the Source Code for Version 10.0.52. This is however a rather old version and is no longer recommended.

So naturally, one follows the Development Snapshot link as the rather old version is no longer recommended, right?  Once you get to the Development Snapshot page, you are then presented with the following scary message:

“You are being warned. This will provide you with a direct link into our current development files. At various times the files may not compile, or in some cases may cause strange errors. Use at your own risk! However please see the version specific notes below. If you are unlucky enough to download a bad version please try again a little later. We apologise for any inconvenience caused.”

This means that when you download Indy, you have no guarantee of getting a stable or even a compiling version.  There is no version information, and you have to rely on their SVN commit logs to figure out what the status of the libraries are.  Oh yes, and the “version specific notes” are missing.

Poor show.

Reproducing the TrustedInstaller.exe leak

Earlier today I blogged about the extraordinary amount of time it was taking to install updates on a Windows 7 virtual machine.  I believed that the issue came down to TrustedInstaller.exe taking what seemed to be an ever increasing amount of RAM and opening an inordinate number of handles.

After writing this blog post, I wondered who at Microsoft might be interested in the issue.  I ended up tweeting a link to Mark Russinovich in the hope that it would whet his interest (the blog did have pictures of Process Explorer, one of Mark’s great Sysinternals utilities ☺):

And verily, Mark did reply:

As Mark was kind enough to respond I knew I had to go and reproduce the issue!  So I built a new VM of Windows 7 Ultimate x64 with stock standard settings, but only 1GB of RAM (oops):

Stock Windows 7 Ultimate x64 install on a VM

I configured Performance Monitor to trace Working Set and Handle counters on the TrustedInstaller.exe process, and saved the settings into a Data Collector Set so that I could capture the whole update process.
 
Then I started the Windows Update:

Windows Update: I selected the additional important update, but none of the optional updates

Three hours later (I did say whoops about the RAM, didn’t I?) Windows finished installing updates, and I was able to generate a report from the Data Collector Set that showed the following graph:

Graph of Working set and Handle Count + summary of Working Set detail

and:

Handle Count detail

At the end of the process we see 34,500 open handles, and a peak of 600MB memory usage.  As one would expect, the RAM usage went up and down quite a bit as updates were installed.  More disturbingly, the Handle Count really did not decrease significantly through the process.  So the real message of the graph is the trend for both the green Handle Count, and the red Working Set, which are both clearly heading the wrong way: up.

There were fewer updates to install than with the other VM, which was also installing Microsoft Office Updates; I’m guessing this had an impact on the overall numbers.

Minor note: I didn’t fix the time zone for the VM.  I wasn’t really doing this at 3AM!

Does Microsoft’s TrustedInstaller.exe have a leak?

When building a virtual machine today, I was curious as to why the Windows Update install process was taking such a long time.  The initial run had over 100 different updates so I expected it to take a while, but it was even slower than expected. So I fired up Process Explorer and discovered that TrustedInstaller.exe had allocated over 1GB of RAM to do its updates.  As I watched, this number kept ticking up until all the updates were installed, with a maximum memory usage of roughly 2GB.  The handle count tended to increase over time as well, hitting a maximum of 50500 handles as I watched.

This VM was setup with 2GB RAM so this was causing lots of swap activity, which I guess is what was slowing down the update process so much.

I grabbed a screenshot at Update 94 when I decided that something was not right with this:

And another shot with the final update, Update 103.  50,484 handles is a whole lotta handles.

As each update finished, both the handle count and the memory usage would drop, but never back to the level before the update.

This really does suggest that TrustedInstaller.exe is leaking memory.  So is this intentional?  Whether by design, or by accident, it’s horrible.  Microsoft?

Update: Mark asked me why one would leak by design.  I suggested that one possibility might be that TrustedInstaller locks files or objects to prevent other processes from making changes to them before Windows is restarted, as the update process does not complete until Windows restarts.