Category Archives: Computing

Debugging a stalled Delphi process with Windbg and memory searches

Today I’ve got a process on my machine that is supposed to be exiting, but it has hung. Let’s load it up in Windbg and find what’s up. The program in question was built in Delphi XE2, and symbols were generated by our internal tds2dbg tool (but there are other tools online which create similar .dbg files). As usual, I am writing this up for my own benefit as much as anyone else’s, but if I put it on my blog, it forces me to put in enough detail that even I can understand it when I come back to it!

Looking at the main thread, we can see unit finalizations are currently being called, but the specific unit finalization section and functions which are being called are not immediately visible in the call stack, between InterlockedCompareExchange and FinalizeUnits:

0:000> kb
ChildEBP RetAddr  Args to Child              
0018ff3c 0040908c 0018ff78 0040909a 0018ff5c audit4_patient!System.Sysutils.InterlockedCompareExchange+0x5 [C:\Program Files (x86)\Embarcadero\RAD Studio\9.0\source\rtl\sys\InterlockedAPIs.inc @ 23]
0018ff5c 004094b2 0018ff88 00000000 00000000 audit4_patient!System.FinalizeUnits+0x40 [C:\Program Files (x86)\Embarcadero\RAD Studio\9.0\source\rtl\sys\System.pas @ 17473]
0018ff88 74b7336a 7efde000 0018ffd4 76f99f72 audit4_patient!System..Halt0+0xa2 [C:\Program Files (x86)\Embarcadero\RAD Studio\9.0\source\rtl\sys\System.pas @ 18599]
0018ff94 76f99f72 7efde000 35648d3c 00000000 kernel32!BaseThreadInitThunk+0xe
0018ffd4 76f99f45 01a5216c 7efde000 00000000 ntdll!__RtlUserThreadStart+0x70
0018ffec 00000000 01a5216c 7efde000 00000000 ntdll!_RtlUserThreadStart+0x1b

So, the simplest way to find out where we were was to step out of the InterlockedCompareExchange call. I found myself in System.SysUtils.DoneMonitorSupport (specifically, the CleanEventList subprocedure):

0:000> p
eax=01a8ee70 ebx=01a8ee70 ecx=01a8ee70 edx=00000001 esi=00000020 edi=01a26e80
eip=0042dcb1 esp=0018ff20 ebp=0018ff3c iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00200202
audit4_patient!CleanEventList+0xd:
0042dcb1 33c9            xor     ecx,ecx

After a little more spelunking, and a review of the Delphi source around this function, I found that this was a part of the System.TMonitor support. Specifically, there was a locked TMonitor somewhere that had not been destroyed. I stepped through a loop that was spinning, waiting for the object to be unlocked so its handle could be destroyed, and found a reference to the data in question here:

0:000> p
eax=00000001 ebx=01a8ee70 ecx=01a8ee70 edx=00000001 esi=00000020 edi=01a26e80
eip=0042dcaf esp=0018ff20 ebp=0018ff3c iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00200202
audit4_patient!CleanEventList+0xb:
0042dcaf 8bc3            mov     eax,ebx

Looking at the record pointed to by ebx, we had a reference to an event handle handy:

0:000> dd ebx L2
01a8ee70  00000001 00000928

  TSyncEventItem = record
    Lock: Integer;
    Event: Pointer;
  end;

Although Event is a Pointer, internally it’s just cast from an event handle. So I guess that we can probably find another reference to that handle somewhere in memory, corresponding to a TMonitor record:

  TMonitor = record
  strict private
    // ... snip ...
    var
      FLockCount: Integer;
      FRecursionCount: Integer;
      FOwningThread: TThreadID;
      FLockEvent: Pointer;
      FSpinCount: Integer;
      FWaitQueue: PWaitingThread;
      FQueueLock: TSpinLock;

And if we search for that event handle:

0:000> s -[w]d 00400000 L?F000000 00000928
01a8ee74  00000928 00000000 0000092c 00000000  (.......,.......
07649334  00000928 002c0127 0012ccb6 0000002e  (...'.,.........
0764aa14  00000928 01200125 004b8472 000037ce  (...%. .r.K..7..
07651e24  00000928 05f60125 01101abc 00000e86  (...%...........
08a47544  00000928 00000000 00000000 00000000  (...............

Now one of these should correspond to a TMonitor record. The first entry (01a8ee74) is just part of our TSyncEventItem record, and the next three don’t make sense given that the FSpinCount (the next value in the memory dump) would be invalid. So let’s look at the last one. Counting quickly on all my fingers and toes, I establish that that makes 08a47538 the start of the TMonitor record. And… so we search for a pointer to that.

0:000> s -[w]d 00400000 L?F5687000 08a47538
076b1d24  08a47538 08aa3e40 076b1db1 0122fe50  8u..@>....k.P.".

Just one! But here it gets a little tricky, because the PMonitor pointer is in a ‘hidden’ field at the end of the object. So we need to locate the start of the object.

0:000> dd 076b1d00
076b1d00  0122fe50 00000000 00000000 076b1df1
076b1d10  0122fe50 00000000 00000000 076b0ce0
076b1d20  004015c8 08a47538 08aa3e40 076b1db1
076b1d30  0122fe50 00000000 00000000 076b1d51
... snip ...

I’m just stabbing in the dark here, but that 004015c8 that’s just four bytes back smells suspiciously like an object class pointer. Let’s see:

0:000> da poi(4015c8-38)+1
004016d7  "TObject&"

Ta da! That all fits. A TObject has no data members, so the next 4 bytes should be the TMonitor (search for hfMonitorOffset in the Delphi source to learn more). So we have a TObject being used as a TMonitor lock reference. (Learn about that poi(address-38)+1 magic). But what other naughty object is hanging about, using this TObject as its lock?

0:000> s -[w]d 00400000 L?F5687000 076b1d20
098194b0  076b1d20 00000000 00000000 09819449   .k.........I...

Just one. Let’s trawl back in memory just a little bit and have a look at this one.

0:000> dd 09819480  
09819480  00f0f0f0 00ffffff 00000000 09819641
09819490  00447b7c 00000000 00000000 00000000
098194a0  00000000 098150c0 00448338 098195c8
098194b0  076b1d20 00000000 00000000 09819449
... snip ...

0:000> da poi(00448338-38)+1
004483c0  "TThreadList&"

And what does a TThreadList look like?

  TThreadList = class
  private
    FList: TList;
    FLock: TObject;
    FDuplicates: TDuplicates;

Yes, that definitely looks hopeful! That FLock is pointing to our lock TObject… I believe that’s called a Quality Match.

This is still a little bit too generic for me, though. TThreadList is a standard Delphi class used by the bucketload. Let’s try and identify who is using this list and leaving it lying about. First, we’ll quickly have a look at that TThreadList.FList to see if it has anything of interest — that’s the first data member in the object == object+4.

0:000> dd poi(098194ac)
098195c8  00447b7c 00000000 00000000 00000000
... snip ...
0:000> da poi(447b7c-38)+1
00447cad  "TList'"

Yep, it’s a TList. Just making sure. It’s empty, what a shame (TList.FCount is the second data member in the object == 00000000, as is the list pointer itself).

So how else can we find the usage of that TThreadList? Is it TThreadList referenced anywhere then? Break out the search tool again!

0:000> s -[w]d 00400000 L?F5687000 098194a8
076d3410  098194a8 00000000 09819580 00000000  ................

Yes. Just once, again. Again, we scroll back in memory to find the base of that object.

0:000> dd 076d33c0  
076d33c0  08abfe60 00000000 00000000 076d4c39
076d33d0  01616964 0000091c 00001850 00000100
076d33e0  00000001 00000000 00000000 00000000
076d33f0  076d33d0 00000000 01616d38 076d33d0
076d3400  00000000 00000000 00000000 00000000
076d3410  098194a8 00000000 09819580 00000000
076d3420  00000000 076d2ea9 0075e518 00000000
076d3430  00401ecc 00000000 00000000 00000000

There was a false positive at 076d33f8, but then magic happened at 076d33d0:

0:000> da poi(01616964-38)+1
016169e8  "TAnatomyDiagramTileLoaderThread&"

Wow! Something real! Let’s dissect this a bit. Looking at the definition of TThread, we have the following data:

076d33d0  01616964  class pointer TAnatomyDiagramTileLoaderThread
076d33d4  0000091c  TThread.FHandle
076d33d8  00001850  TThread.FThreadID
076d33dc  00000100  TThread.FCreateSuspended, .FTerminated, .FSuspended, .FFreeOnTerminate (watch that endianness!)
076d33e0  00000001  TThread.FFinished
076d33e4  00000000  TThread.FReturnValue
076d33e8  00000000  TThread.FOnTerminate.Code
076d33ec  00000000  TThread.FOnTerminate.Data
076d33f0  076d33d0  TThread.FSynchronize.TThread (= Self)
076d33f4  00000000  padding
076d33f8  01616d38  TThread.FSynchronize.FMethod.Code (= last Synchronize target method)
076d33fc  076d33d0  TThread.FSynchronize.FMethod.Data (= Self)
076d3400  00000000  TThread.FSynchronize.FProcedure
076d3404  00000000  TThread.FSynchronize.FProcedure
076d3408  00000000  TThread.FFatalException
076d340c  00000000  TThread.FExternalThread

Then, into TAnatomyDiagramTileLoaderThread:

076d3410  098194a8  TAnatomyDiagramTileLoaderThread.FTiles: TThreadList

So… we can tell that the thread has exited (FFinished == 1), and verify that by looking at running threads, looking for thread id 1850:

0:000> ~
.  0  Id: 1c98.1408 Suspend: 1 Teb: 7efdd000 Unfrozen
   1  Id: 1c98.16c8 Suspend: 1 Teb: 7efda000 Unfrozen
   2  Id: 1c98.11a0 Suspend: 1 Teb: 7ee1c000 Unfrozen
   3  Id: 1c98.1bbc Suspend: 1 Teb: 7ee19000 Unfrozen
   4  Id: 1c98.10dc Suspend: 1 Teb: 7ee04000 Unfrozen
   5  Id: 1c98.12f0 Suspend: 1 Teb: 7edfb000 Unfrozen
   6  Id: 1c98.1d38 Suspend: 1 Teb: 7efaf000 Unfrozen
   7  Id: 1c98.1770 Suspend: 1 Teb: 7edec000 Unfrozen
   8  Id: 1c98.1044 Suspend: 1 Teb: 7ede3000 Unfrozen
   9  Id: 1c98.bf4 Suspend: 1 Teb: 7ede0000 Unfrozen
  10  Id: 1c98.b3c Suspend: 1 Teb: 7efd7000 Unfrozen

Furthermore, the handle is invalid:

0:000> !handle 91c
Could not duplicate handle 91c, error 6

That suggests that the object has already been destroyed. But that the TThreadList hasn’t.

And sure enough, when I looked at the destructor for TAnatomyDiagramTileLoadThread, we clear the TThreadList, but we never free it!

Now, another way we could have caught this was to turn on leak detection. But leak detection is not always perfect, especially when you get some libraries that *cough* have a lot of false positives. And of course while we could have switched on heap leak detection, that involves rebuilding and restarting the process and losing the context along the way, with no guarantee we’ll be able to reproduce it again!

While this approach does feel a little tedious, and we did have some luck in this instance with freed objects not being overwritten, and the values we were searching for being relatively unique, it does nevertheless feel pretty deterministic, which must be better than the old “try-it-and-hope” debugging technique.

Introducing Mesmeride

So I recently had some holidays. Weird, I know. I took two whole weeks off and only had to go into the office twice during that time. My first week had unseasonably nice weather, so I spent some time on my bike making the most of it.

In the second week, the weather soured, so I took the opportunity to learn something of Ruby on Rails with the great Rails tutorial. I am not generally a big fan of tutorials but this particular one covered a lot of bases, and was well organised. Equally excellent were Railscasts.

After working through the first few chapters of the tutorial, I was comfortable enough to start on my own project to test my newly acquired knowledge.

Enter Mesmeride. With this project, I had two objectives:

  • Get a functional and “useful” Ruby on Rails site live in a week.
  • Get my Strava gradient rendering code running again with the new v3 Strava API.

mesmeride-1

Mesmeride allows you to take any Strava activity or segment, and graph it out in a number of different styles. You can add waypoints and control the length, height and size of the presentation, making it suitable for print or web. After tweaking the style of the graph to perfection, you can share the result on Twitter or Facebook, embed the image on your blog, or save it for printing or offline sharing.

Waypoints

Any ride of a reasonable length will have points of interest. The Giro renderer will draw these onto the profile. You can add and delete waypoints, move them along the ride, and change their names in the left hand box in the controls section.

Mountains or Molehills?

The most popular or remarked-upon feature is the ability to make any of your rides, even the most flat and featureless, look like a day attacking the biggest climbs of the Alps. You can control the mountainosity of your ride with the Netherlands-Switzerlands slider (also called the Molehills-Mountain slider).

Size and Length

To help you adjust the dimensions of the graphic, for print or for web, you can rescale the entire ride graphic with the “Teensy – Ginormous slider”, or make the ride appear longer or shorter with the “Shopping Trip – Grand Tour” slider.

Sharing

What good is a graphic without eyes to look at it? Mesmeride has tools to share any of the graphics you create on Twitter, Facebook or even by embedding them in your blog. Or of course you can save the image and download it. The images are stored on Amazon S3, and you can save up to 3 for any given route.

Sharing your ride
Sharing your ride

I even drew the logo myself. Can you tell?

mesmeride-2

Mesmeride will save the design you create as well, and you can come back later and change it round into many other styles.

mesmeride-3

In the future I may add mapping, additional gradient styles, and more controls and waypoint types to existing styles.

Here are a few examples from my race last weekend, via Strava. No, I didn’t do well, but never mind 😉 The screenshots above show the editor in action; what you see below are the resulting files.  I even fixed a bug in Mesmeride when preparing this…

Hell of the South
Hell of the South, full route profile, with the Mesmeride “Giro” Renderer. The waypoints are fully customisable!
Hell of the South Climb 1
The Gardiner’s Bay Climb at the start of Hell of the South. Presented with the Mesmeride “Le Tour” segment renderer
Hell of the South Kettering Climb
The climb out of Kettering, presented in the “Le Tour” rendering style. This is the climb I came unstuck on…
Nicholl's Rivulet Climb
The Nicholl’s Rivulet Climb, a lovely, smooth winding climb which I suffered greatly on. Off the back… 🙂

To finish with, the whole ride again, in another style.

HotS "Hobart 10,000 Banner" Style
HotS “Hobart 10,000 Banner” Style

The case of the stack trace that wouldn’t (trace)

One of my favourite Windows tools is Procmon.  I pull it out regularly, often as a first port of call when diagnosing complicated and opaque problems in the software I develop.  Or in anyone’s software, really.

Procmon captures a trace of key I/O activities on your computer, including file, registry and network activity, and makes it really easy to spot operations that have failed or that may be causing problems.  It’s great for spotting authentication problems, sharing violations, missing files and more (… malware).  Procmon goes as far as recording a stack trace for nearly every operation it captures!

Today, we were trying to diagnose a problem with a process that was taking 15 seconds or longer to start on a Windows XP computer.  The normal start time for that process should have been 1-2 seconds.  None of the usual culprits came forward and admitted fault, so it was time to pull out Procmon again.

We quickly spotted a big fat delay in the trace.  Note the time stamps in the two selected rows  in the screen capture below.

A big time gap between 4:10:04 and 4:10:17

Now it was time to try and find out what was causing this.  So we examined the stack trace for each entry, except … there were no symbols.  Easy enough to fix — copy dbghelp.dll from a version of Microsoft’s Debugging Tools for Windows onto the system temporarily, fixup the symbol path in Procmon’s options, and … nope, still no symbols.  Now this is one area where Procmon falls down a little bit.  If symbol loading fails, it just silently fails.  No warnings, errors or hints as to what might be going on.

This issue was occurring on a client’s computer, so it was time to take the investigation elsewhere for examination.  Before we could really examine the captured trace, we needed to get symbols going.  But how?

Procmon to the rescue!

That’s right, we realised we can use Procmon to diagnose itself!  I booted up a clean new Windows XP virtual machine, loaded Procmon onto it, ran a basic capture of some random events.

A trace on XP
A trace on XP
No symbols showing, only exports
No symbols showing, only exports
Configuring symbols for Procmon
Configuring symbols for Procmon

Even after configuring symbols, they still silently failed to load.  So I stopped the capture, saved it and immediately opened the saved capture, to stop this instance of Procmon from capturing events on the local computer.  I then started a second instance of Procmon, removed the Procmon exclusion from the filtering, and instead, added a filter to include Procmon (I also filtered specifically for the PID of the original Procmon, later):

Configuring Procmon to watch itself
Configuring Procmon to watch itself

Then I started the trace, switched back to the first Procmon, and tried to examine the stack.  Of course, still no symbols, but now it was time to switch back to the second, active Procmon process and see what we found.

And what we found was that dbghelp.dll was looking for symsrv.dll in order to download its symbols.  So we copied that also into the folder with procmon.exe  and suddenly everything worked!

dbghelp wants symsrv to help it as well
dbghelp wants symsrv to help it as well
The undecorated stack for the symsrv load request
The undecorated stack for the symsrv load request

Update 19 Sep 2013: Oops.  Forgot to attach the decorated stack (sorry!):

The call stack with symbols
The call stack with symbols loaded.  Note how the function names differ from the original stack.

So that’s the first takeaway from this story: when you want symbols, copy both dbghelp.dll and symsrv.dll from your copy of Debugging Tools for Windows.  We found no other dependencies, even with the latest version of these files.

A diversion

One curious anomaly we spotted: Procmon (or possibly Dbghelp) is looking in some strange places for debug symbols, including appending a SRV*path*url style symbol path to procmon’s parent path, and looking there, without much success:

Some weirdness in symbol loading
Some weirdness in symbol loading

I leave that one for you to solve.

Backtrack to the stack (trace)

Back to the original trace.  We loaded up the saved trace, and found that we now got kernel mode symbols just fine, but no user mode symbols would load.  In fact, Procmon doesn’t even appear to be looking for symbols for these user mode frames — either on the local drive or on the network.  And this time Procmon isn’t able to give us any more detail.  However, when we debugged the call that Procmon made to SymFindFileInPath when viewing a call stack in this log vs another new log, we found that Procmon wasn’t even providing the necessary identifying information.

What information is this?  The identifying information that the symbol servers use is the TimeDateStamp and the SizeOfImage fields from the PE header of the executable file (slightly different for .pdb files).

I surmise that this identifying information is missing from our original trace because this trace was captured before we copied version 6.0 or later of dbghelp.dll onto the client’s computer — meaning that the version that Procmon used when capturing the trace did not record this identifying information.

Therefore, the second takeaway of the story is: always copy a recent version of dbghelp.dll and symsrv.dll into the folder with procmon.exe, before starting a trace.  Even if you intend to analyze the trace later, you’ll find that without these, you won’t get full stack traces.

(Dear Microsoft, please can you consider including these in the Procmon and Procexp downloads, given that you now own Sysinternals?  Saves a lot of hassle!)

Giro-Your-Strava now does Strava Routes as well

Update May 2014: As Strava’s ride pages have changed format significantly, Giro-Your-Strava no longer works. The good news is that Mesmeride does rides — but unfortunately not routes as yet.

After a comment from Mike on my previous Giro-Your-Strava post asking if the bookmarklet could support Strava’s new Routes feature, I took a few minutes over breakfast to spelunk and found it wouldn’t be too hard.

Firebug’s DOM explorer actually made the task much, much simpler. My approach was simply to look for Great Big Arrays Of Numbers. I soon found, amongst all the Google, jQuery, Modernizr, Optimizely and other objects, a pageView object, which contained a number of juicy functions, including pageView.chartContext().dataContext().elevationStream() and pageView.pageContext().routeSegments(), which gave me all the information I needed. The data structures for routes are somewhat different to those for rides, so I opted to massage the Great Big Arrays those functions returned into the same basic structure as the ride data already used by the bookmarklet, rather than touch the rendering code at all.

So… after that overly detailed introduction, here ’tis. I’ve updated the bookmarklet to draw Giro-style elevation profiles for Strava Routes as well as for Rides. And of course, Le Tour-style elevation profiles for segments still work, within individual ride pages.

Note: this still won’t work in IE10, as Github returns the wrong Content-Type for the Javascript and IE gets a little panicky about it, and, well, just use a different browser.

1. Install the bookmarklet.

Here’s the bookmarklet. Just drag it to your Bookmarks toolbar to install it:

Giro-Your-Strava

2. Load a favourite Strava Route and click the bookmark.

(It’s best to wait for the page to load completely before clicking the bookmark.)

Here’s one of my favourite Strava Routes — Hobart 10,000 Day 1:

H10K 2013 Day 1 - Strava Route 2013-09-05 08-26-54

After a few seconds the gradient graphic may refresh with the correct font — this takes a second or two to download.

That’s it! This new bookmarklet still works with rides (so delete the old one!)

The source is still all on GitHub.  Again, if you improve the code, or accidentally hurt yourself while reading it, please do share with a comment on this post.

And as DC Rainmaker says, thanks for reading!

I am happy to announce that the missing characters have been found!

Last year I published a blog post about characters missing from print jobs with Internet Explorer 9 and my adventures in tracing and reporting the issue to Microsoft.

We saw situations where a letter would be printed with random letters missing, as per the following example:

Here’s how it should have been printed (oh yes, that’s a completely fictional name we used for testing non-English Latin script letters in printing):

Today, I was notified that Microsoft have finally publicly published a hotfix!  We received the hotfix from them a few weeks ago, before it was publicly available, and it certainly solved the problem on our test machines and end user computers.

Download the hotfix from:

http://support.microsoft.com/kb/2853777 (Windows 7 SP1, Windows Server 2008 SP1) 

http://support.microsoft.com/kb/2855336 (Windows 8, Windows Server 2012)

(The first link notes that the hotfix is included in the rollup in the second link, though the second link doesn’t mention the hotfix!)

Sadly, they’ve only published a solution for Windows 7 Service Pack 1 onwards, and not for Vista, which could be an issue for some users. 

It was interesting to see that the issue was indeed a race condition (two threads ending up with the same random seed, because, and I quote:

This issue occurs because a conflict causes the text that uses the font to become corrupted when the two threads try to install the font at the same time. The name of the font is generated by the RAND function together with a SEEDvalue whose time value is set to srand(time(NULL)). When the two documents are printed at the same time, the SEEDvalue for the font is the same in both documents. Therefore, the conflict occurs.

So not related to the threading model, which was a little bit of misdirection on my part 😉  That’s par for the course though when debugging complex issues without source…

Fantastic to finally have the fix 🙂

Giro-Your-Strava updated to give Le Tour treatment to the climbs!

Update May 2014: As Strava’s ride pages have changed format significantly, Giro-Your-Strava no longer works. The good news is that Mesmeride does the same and more!

After the Strava API debacle, my little Tour Segment Gradient tool no longer works, which is sad.  I’d put together a number of other Strava API-based widgets, but this was the only one which was really at all popular.  Yesterday, DC Rainmaker himself mentioned (thank you Ray!) the Giro-Your-Strava elevation graph tool (which does still work) on his blog, so what better time to update the Tour Segment Gradient tool?

In short, what I have done is to dump both the Giro and Le Tour gradient mashups into the same bookmarklet.  One click and you get beautiful isometric graphs for your ride (in Giro style) and your efforts (in Le Tour style).  Yes, I get the inconsistency, but what would life be without idiosyncracies?

1. Install the bookmarklet.

Here’s the bookmarklet.  Just drag it to your Bookmarks toolbar to install it:

Giro-Your-Strava

2. Load a favourite Strava ride and click the bookmark.

(It’s best to wait for the page to load completely before clicking the bookmark.)

Presto, you’ll get two spiffy new buttons, one for your ride:

And one for the segment view:

So go ahead and click the Giro button, and you’ll see:

Click the Le Tour button for the new elevation profile for a segment:

Have fun!

One danger with bookmarklets that fiddle with an existing site in this way is that they will tend to break when the site updates.  There are no stability guarantees that APIs (ususally!) provide, so YMMV.  However, if the Strava site layout changes, it’s probably only a simple tweak to the code to get it working again.

There’s nothing beautiful about the code on the backend.  It really needs rewriting and modularisation etc etc etc but hey, it works 😉  Do what you want with the code, just share it with us all if you improve things!

Giro-style elevation graphs for Strava

Update May 2014: As Strava’s ride pages have changed format significantly, Giro-Your-Strava no longer works. The good news is that Mesmeride does the same and more!
Update 13 July 2013: An updated version of Giro-Your-Strava is now available here.

Just in time for the last few days of the Giro, I’ve finished a little after-hours Strava mashup project that builds on the segment graphs that I originally created for the Hobart 10,000.

I’m sure you’ve seen some of the Giro elevation graphs. Here’s one, from Stage 11:

Now, here’s a bookmarklet.  Drag it to your bookmarks toolbar, load up a favourite hilly ride on Strava, and click the bookmarklet.

Giro-Your-Strava

A mysterious new button will appear in the graph menu.  Go, on click it!

And up pops an elevation graph that makes it look like you’ve been riding the Giro!

The algorithm picks the categorised climbs from your ride (and tries to figure out the most appropriate segment where multiple segments finish at the top of a hill). Non-categorised segments are currently ignored.  The whole project is published on GitHub, so you can tweak it and improve it to your heart’s content.  Your first step should be to tidy up the mess I’ve left you 🙂

RSA Key Exchange between CryptoAPI and CNG (BCrypt)

Microsoft, a few years ago, wrote a new cryptography API for Windows Vista called Cryptography Next Generation (CNG).  This replaces the existing cryptography API in Windows XP and earlier versions which was simply called CryptoAPI.

Now, compatibility between the two APIs is touted as a feature of the new CNG API, but a somewhat more sombre reality tends to set in pretty quickly once you start working with them.  A motley collection of loosely documented data structures, endianness differences, and “Not Supported” footnotes pile up as quickly as you can open MSDN documentation tabs in your browser.  In my research, I also found a dearth of concrete examples of how to interoperate between the two APIs online.  In fact, there was only one page which really helped, buried in Microsoft’s Output Protection Manager documentation.

There is something magical about all your bits lining up in a row when mixing crypto libraries of any ilk, and I wanted to share!  Hence this story.

My requirement was pretty simple: I wanted to generate an RSA key pair using CryptoAPI in my client application, hand off the public key to another application, which happens to use CNG, wave my hands a little, and magically transact a key exchange.  Here’s how to do it (with some error handling elided for your own reading sanity).  You should be able to load this into any recent version of Visual Studio, add bcrypt.lib to your library includes and build.  The code should be reasonably self-explanatory, so I’ll leave it there.

The usual disclaimers apply. Source on GitHub.

RIGHTeously tripping over T-SQL’s LEN function

We tripped over recently on our understanding of Microsoft SQL Server’s T-SQL LEN function.  The following conversation encapsulates in a nutshell what any sensible developer would assume about the function.

@marcdurdin I guess the answer is, removing the line would give the same result. Right? 🙂

— S Krupa Shankar (@tamil) April 23, 2013

Now, I wouldn’t be writing this blog post if that was the whole story.  Because, like so many of these things, it’s not quite that simple.

Originally, we were using RIGHT(string, LEN(string) – 2) to trim two characters off the front of our string.  Or something like that.  Perfectly legitimate and reasonable, one would think.  But we were getting strange results, trimming more characters than we expected.

It turns out that T-SQL’s LEN function does not return the length of the string.  It returns the length of the string, excluding trailing blanks.  That is, excluding U+0020, 0x20, 32, ‘ ‘, or whatever you want to call it.  But not tabs, new lines, zero width spaces, breaking or otherwise, or any other Unicode space category characters.  Just that one character.  This no doubt comes from the history of the CHAR(n) type, where fields were always padded out to n characters with spaces.  But today, this is not a helpful feature.

Of course, RIGHT(string) does not ignore trailing blanks

But here’s where it gets really fun.  Because a post about strings is never complete until you’ve done it in Unicode.  Some pundits suggest using DATALENGTH to get the actual length of the string.  This returns the length in bytes, not characters (remember that NCHAR is UTF-16, so 2 bytes per character… sorta!).  Starting with SQL Server 2012, UTF-16 supplementary pairs can be treated as a single character, with the _SC collations, so you can’t even use DATALENGTH*2 to get the real length of the string!

OK.  So how do you get the length of a string, blanks included, now that we’ve established that we can’t use DATALENGTH?  Here’s one simple way:

  SET @realLength = LEN(@str + ‘.’) – 1

Just to really do your head in, let’s see what happens when you use LEN with a bunch of garden variety SQL strings.  I should warn you that this is a display artefact involving the decidedly unrecommended use of NUL characters, and no more, but the weird side effects are too fun to ignore.

First, here’s a set of SQL queries for our default collation (Latin1_General_CI_AS):

Note the output.  Not exactly intuitive!  Now we run the Unicode version:

Not quite as many gotchas in that one?  Or are there?  Notice how the first line shows a wide space for the three NUL characters — but not quite as wide as the Ideographic space…

Now here’s where things go really weird.  Running in Microsoft’s SQL Server Management Studio, I switch back to the first window, and, I must emphasise, without running any queries, or making any changes to settings, take a look at how the Messages tab appears now!

That’s right, the first line in the messages has magically changed!  The only thing I can surmise is that switching one window into Unicode output has affected the whole program’s treatment of NUL characters.  Let that be a warning to you (and I haven’t even mentioned consuming said NULs in C programs).

The Fragile Abstract Factory Delphi Pattern

When refactoring a large monolithic executable written in Delphi into several executables, we ran into an unanticipated issue with what I am calling the fragile abstract factory (anti) pattern, although the name is possibly not a perfect fit.  To get started, have a look at the following small program that illustrates the issue.

program FragileFactory;

uses
  MyClass in 'MyClass.pas',
  GreenClass in 'GreenClass.pas',
  //RedClass in 'RedClass.pas',
  SomeProgram in 'SomeProgram.pas';

var
  C: TMyClass;
begin
  C := TMyClass.GetObject('TGreenClass');
  writeln(C.ToString);
  C.Free;

  C := TMyClass.GetObject('TRedClass');  // oh dear… that’s going to fail
  writeln(C.ToString);
  C.Free;
end.
unit MyClass;

interface

type
  TMyClass = class
  protected
    class procedure Register;
  public
    class function GetObject(ClassName: string): TMyClass;
  end;

implementation

uses
  System.Contnrs,
  System.SysUtils;

var
  FMyClasses: TClassList = nil;

{ TMyObjectBase }

class procedure TMyClass.Register;
begin
  if not Assigned(FMyClasses) then
    FMyClasses := TClassList.Create;
  FMyClasses.Add(Self);
end;

class function TMyClass.GetObject(ClassName: string): TMyClass;
var
  i: Integer;
begin
  for i := 0 to FMyClasses.Count – 1 do
    if FMyClasses[i].ClassNameIs(ClassName) then
    begin
      Result := FMyClasses[i].Create;
      Exit;
    end;

  Result := nil;
end;

initialization
finalization
  FreeAndNil(FMyClasses);
end.
unit GreenClass;

interface

uses
  MyClass;

type
  TGreenClass = class(TMyClass)
  public
    function ToString: string; override;
  end;

implementation

{ TGreenClass }

function TGreenClass.ToString: string;
begin
  Result := 'I am Green';
end;

initialization
  TGreenClass.Register;
end.

What happens when we run this?

C:\src\fragilefactory>Win32\Debug\FragileFactory.exe
I am Green
Exception EAccessViolation in module FragileFactory.exe at 000495A4.
Access violation at address 004495A4 in module ‘FragileFactory.exe’. Read of address 00000000.

Note the missing TRedClass in the source.  We don’t discover until runtime that this class is missing.  In a project of this scale, it is pretty obvious that we haven’t linked in the relevant unit, but once you get a large project (think hundreds or even thousands of units), it simply isn’t possible to manually validate that the classes you need are going to be there.

There are two problems with this fragile factory design pattern:

  1. Delphi use clauses are fragile (uh, hence the name of the pattern).  The development environment frequently updates them, typically they are not organised, sorted, or even formatted neatly.  This makes validating changes to them difficult and error-prone.  Merging changes in version control systems is a frequent cause of errors.
  2. When starting a new Delphi project that utilises your class libraries, ensuring you use all the required units is a hard problem to solve.

Typically this pattern will be used with in a couple of ways, somewhat less naively than the example above:

  1. There will be an external source for the identifiers.  In the project in question, these class names were retrieved from a database, or from linked resources.
  2. The registered classes will be iterated over and each called in turn to perform some function.

Of course, this is not a problem restricted to Delphi or even this pattern.  Within Delphi, any unit that does work in its initialization section is prone to this problem.  More broadly, any dynamically linking registry, such as COM, will have similar problems.  The big gotcha with Delphi is that the problem can only be resolved by rebuilding the project, which necessitates rollout of an updated executable — much harder than just re-registering a COM object on a client site for example.

How then do we solve this problem?  Well, I have not identified a simple, good clean fix.  If you have one, please tell me!  But here are a few things that can help.

  1. Where possible, use a reference to the class itself, such as by calling the class’s ClassName function, to enforce linking the identified class in.  For example:
    C := TMyClass.GetObject(TGreenClass.ClassName);
    C := TMyClass.GetObject(TRedClass.ClassName);
  2. When the identifiers are pulled from an external resource, such as a database, you have no static reference to the class.  In this case, consider building a registry of units, automatically generated during your build if possible.  For example, we automatically generate a registry during build that looks like this:
    unit MyClassRegistry;
    
    // Do not modify; automatically generated 
    
    initialization
    procedure AssertMyClassRegistry;
    
    implementation
    
    uses
      GreenClass,
      RedClass;
    
    procedure AssertMyClassRegistry;
    begin
      Assert(TGreenClass.InheritsFrom(TMyClass));
      Assert(TRedClass.InheritsFrom(TMyClass));
    end; 
    
    end.

    These are not assertions that are likely to fail but they do serve to ensure that the classes are linked in.  The AssertMyClassRegistry function is called in the constructor of our main form, which is safer than relying on a use clause to link it in.

  3. Units that can cause this problem can be identified by searching for units with initialization sections in your project (don’t forget units that also use the old style begin instead of initialization — a helpful grep regular expression for finding these units is (?s)begin(?:.(?!end;))+\bend\.).  This at least gives you a starting point for making sure you test every possible case.  Static analysis tools are very helpful here.
  4. Even though it goes against every encapsulation design principle I’ve ever learned, referencing the subclass units in the base class unit is perhaps a sensible solution with a minimal cost.  We’ve used this approach in a number of places.
  5. Format your use clauses, even sorting them if possible, with a single unit reference on each line.  This is a massive boon for source control.  We went as far as building a tool to clean our use clauses, and found that this helped greatly.

In summary, then, the fragile abstract factory (anti) pattern is just something to be aware of when working on large Delphi projects, mainly because it is hard to test for: the absence of a unit only comes to light when you actually call upon that unit, and due to the fragility of Delphi’s use clauses, unrelated changes are likely to trigger the issue.