Category Archives: Development

A more UTF-32 aware JavaScript String library

One of the hassles I regularly experience with JavaScript is that it does not have native support for supplementary characters.  Internally, JavaScript uses UCS-2 encoding (unlike most of the rest of the web, which uses UTF-8…)  While you can use surrogate pairs to represent Unicode characters between U+10000 and U+10FFFF, this makes string handling with these characters a pain.  In particular, functions such as indexOf and substr have to be very carefully used, both to account for the surrogate pairs in their index parameters, and to avoid cutting them in half when manipulating the string. Of course, when interfacing with third party services, you will need to be aware of how they handle text.  For instance, the Twitter 140 character limit counts Unicode code points, not UCS-2 code units.  But many other products, (for example, Microsoft SQL Server), use UTF-16 or UCS-2 internally and treat supplementary plane characters as 2 code units for the purposes of calculating field size.  Developer beware! Anyway, I have put together a small set of functions that treat surrogate pairs as a single code point, abstracting away surrogate pairs at the basic String level.  Adding support for surrogate pairs is not a complete solution — I haven’t done any work on regular expressions, for example, and this code also does not begin to address more complex requirements around grapheme clusters or normalisation, but this is just one less complexity to worry about. The functions do not replace any existing String functions. This code is not complete: I am missing some boundary conditions and edge cases, and I haven’t yet tested with isolated surrogate code units — but for what it’s worth, here it is.  The kmw prefix refers to KeymanWeb, which will shortly be using the functions (replacing the mishmash of code we currently use…) Some simple examples:

var str="Brave "+String.kmwFromCharCode(0x13027, 0x1314C, 0x1309C)+" world";

alert(str.indexOf("w"));    // Displays 13
alert(str.kmwIndexOf("w")); // Displays 10

alert(str.length);       // Displays 18
alert(str.kmwLength());  // Displays 15

alert(str.kmwSubstr(4,3));  // Displays e U+13027
alert(str.substr(4,3));  // Displays e U+D80C (half a supplementary pair!)

The license on this code is Mozilla Public License 1.1.  A couple of functions were lifted from the proposal Supplementary Characters for ECMAScript by Norbert Lindenberg and tweaked (back) to more closely mimic the functions they replace, warts and all.  These two functions are probably better tested than my ones! Version 1.0 of this library plus a rudimentary test script can be downloaded from http://durdin.net/blog-files/kmwString-0.1.zip. Comments, bug fixes, flames, suggestions much appreciated!

/**
  @preserve (C) 2012 Tavultesoft Pty Ltd
  
  Adds functions to treat supplementary plane characters in the same 
  way as basic multilingual plane characters in JavaScript.
  
  Version 0.1
  
  License
  
  The contents of this file are subject to the Mozilla Public License
  Version 1.1 (the "License"); you may not use this file except in
  compliance with the License. You may obtain a copy of the License at
  http://www.mozilla.org/MPL/

  Software distributed under the License is distributed on an "AS IS"
  basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the
  License for the specific language governing rights and limitations
  under the License.

  The Original Code is (C) 2012 Tavultesoft Pty Ltd.

  The Initial Developer of the Original Code is Tavultesoft.
*/

/**
 * Constructs a string from one or more Unicode character codepoint values 
 * passed as integer parameters.
 * 
 * @param  {integer} cp0,...   1 or more Unicode codepoints, e.g. 0x0065, 0x10000
 * @return {String}            The new String object.
 */
String.kmwFromCharCode = function() {
  var chars = [], i;
  for (i = 0; i < arguments.length; i++) {
    var c = Number(arguments[i]);
	if (!isFinite(c) || c < 0 || c > 0x10FFFF || Math.floor(c) !== c) {
	  throw new RangeError("Invalid code point " + c);
	}
	if (c < 0x10000) {
	  chars.push(c);
	} else {
	  c -= 0x10000;
	  chars.push((c >> 10) + 0xD800);
	  chars.push((c % 0x400) + 0xDC00);
	}
  }
  return String.fromCharCode.apply(undefined, chars);
}

/**
 * Returns a number indicating the Unicode value of the character at the given 
 * code point index, with support for supplementary plane characters.
 * 
 * @param  {integer} codePointIndex  The code point index into the string (not 
                                     the code unit index) to return
 * @return {integer}                 The Unicode character value
 */
String.prototype.kmwCharCodeAt = function(codePointIndex) {
  var str = String(this);
  var codeUnitIndex = 0;
  
  if (codePointIndex < 0 || codePointIndex  >= str.length) {
    return NaN;
  }

  for(var i = 0; i < codePointIndex; i++) {
    codeUnitIndex = str.kmwNextChar(codeUnitIndex);
	if(codeUnitIndex == undefined) return NaN;
  }
  
  var first = str.charCodeAt(codeUnitIndex);
  if (first >= 0xD800 && first <= 0xDBFF && str.length > codeUnitIndex + 1) {
    var second = str.charCodeAt(codeUnitIndex + 1);
	if (second >= 0xDC00 && second <= 0xDFFF) {
	  return ((first - 0xD800) << 10) + (second - 0xDC00) + 0x10000;
	}
  }
  return first;  
}

/**
 * Returns the code point index within the calling String object of the first occurrence
 * of the specified value, or -1 if not found.
 * 
 * @param  {string}  searchValue    The value to search for
 * @param  {integer} fromIndex      Optional code point index to start searching from
 * @return {integer}                The code point index of the specified search value
 */
String.prototype.kmwIndexOf = function(searchValue, fromIndex) {
  var str = String(this);
  var codeUnitIndex = str.indexOf(searchValue, fromIndex);
  
  if(codeUnitIndex < 0) {
    return codeUnitIndex;
  }
  
  var codePointIndex = 0;
  for(var i = 0; i < codeUnitIndex; i = str.kmwNextChar(i), codePointIndex++);
  return codePointIndex;
}

/**
 * Returns the code point index within the calling String object of the last occurrence 
 * of the specified value, or -1 if not found.
 * 
 * @param  {string}  searchValue    The value to search for
 * @param  {integer} fromIndex      Optional code point index to start searching from
 * @return {integer}                The code point index of the specified search value
 */
String.prototype.kmwLastIndexOf = function(searchValue, fromIndex)
{
  var str = String(this);
  var codeUnitIndex = str.lastIndexOf(searchValue, fromIndex);
  
  if(codeUnitIndex < 0) {
    return codeUnitIndex;
  }
  
  var codePointIndex = 0;
  for(var i = 0; i < codeUnitIndex; i = str.kmwNextChar(i), codePointIndex++);
  return codePointIndex;
}

/**
 * Returns the length of the string in code points, as opposed to code units.
 * 
 * @return {integer}                The length of the string in code points
 */
String.prototype.kmwLength = function() {
  var str = String(this);
  
  if(str.length == 0) {
    return 0;
  }
  
  for(var i = 0, codeUnitIndex = 0; codeUnitIndex != undefined; i++, 
    codeUnitIndex = str.kmwNextChar(codeUnitIndex));
  return i;
}

/**
 * Extracts a section of a string and returns a new string.
 * 
 * @param  {integer} beginSlice    The start code point index in the string to 
 *                                 extract from
 * @param  {integer} endSlice      Optional end code point index in the string
 *                                 to extract to
 * @return {string}                The substring as selected by beginSlice and
 *                                 endSlice
 */
String.prototype.kmwSlice = function(beginSlice, endSlice) {
  var str = String(this);
  var beginSliceCodeUnit = str.kmwCodePointToCodeUnit(beginSlice);
  var endSliceCodeUnit = str.kmwCodePointToCodeUnit(endSlice);
  return str.slice(beginSliceCodeUnit, endSliceCodeUnit);
}

/**
 * Returns the characters in a string beginning at the specified location through
 * the specified number of characters.
 * 
 * @param  {integer} start         The start code point index in the string to 
 *                                 extract from
 * @param  {integer} length        Optional length to extract
 * @return {string}                The substring as selected by start and length
 */
String.prototype.kmwSubstr = function(start, length)
{
  var str = String(this);
  if(start < 0)
  {
    start = str.kmwLength() + start;
	if(start < 0) {
	  start = 0;
	}
  }
  var startCodeUnit = str.kmwCodePointToCodeUnit(start);
  var endCodeUnit = startCodeUnit;
  
  if(length == undefined) {
    endCodeUnit = str.length;
  } else {
    for(var i = 0; i < length; i++, endCodeUnit = str.kmwNextChar(endCodeUnit));
  }

  return str.substring(startCodeUnit, endCodeUnit);
}

/**
 * Returns the characters in a string between two indexes into the string.
 * 
 * @param  {integer} indexA        The start code point index in the string to 
 *                                 extract from
 * @param  {integer} indexB        The end code point index in the string to 
 *                                 extract to
 * @return {string}                The substring as selected by indexA and indexB
 */
String.prototype.kmwSubstring = function(indexA, indexB)
{
  var str = String(this);
  
  if(indexA > indexB) { var c = indexA; indexA = indexB; indexB = c; }
  
  var indexACodeUnit = str.kmwCodePointToCodeUnit(indexA);
  var indexBCodeUnit = str.kmwCodePointToCodeUnit(indexB);
  if(isNaN(indexBCodeUnit)) indexBCodeUnit = str.length;

  return str.substring(indexACodeUnit, indexBCodeUnit);
}

/*
  Helper functions
*/

/**
 * Returns the code unit index for the next code point in the string, accounting for
 * supplementary pairs 
 *
 * @param  {integer} codeUnitIndex   The code unit position to increment
 * @return {integer}                 The index of the next code point in the string,
 *                                   in code units
*/
String.prototype.kmwNextChar = function(codeUnitIndex) {
  var str = String(this);
  
  if(codeUnitIndex < 0 || codeUnitIndex >= str.length - 1) {
    return undefined;
  }
  
  var first = str.charCodeAt(codeUnitIndex);
  if (first >= 0xD800 && first <= 0xDBFF && str.length > codeUnitIndex + 1) {
    var second = str.charCodeAt(codeUnitIndex + 1);
	if (second >= 0xDC00 && second <= 0xDFFF) {
	  if(codeUnitIndex == str.length - 2) {
	    return undefined;
	  }
	  return codeUnitIndex + 2;
	}
  }
  return codeUnitIndex + 1;
}

/**
 * Returns the code unit index for the previous code point in the string, accounting
 * for supplementary pairs 
 *
 * @param  {integer} codeUnitIndex   The code unit position to decrement
 * @return {integer}                 The index of the previous code point in the
 *                                   string, in code units
*/
String.prototype.kmwPrevChar = function(codeUnitIndex) {
  var str = String(this);

  if(codeUnitIndex <= 0 || codeUnitIndex > str.length) {
    return undefined;
  }
  
  var second = str.charCodeAt(codeUnitIndex - 1);
  if (second >= 0xDC00 && first <= 0xDFFF && codeUnitIndex > 1) {
    var first = str.charCodeAt(codeUnitIndex - 2);
	if (first >= 0xD800 && second <= 0xDBFF) {
	  return codeUnitIndex - 2;
	}
  }
  return codeUnitIndex - 1;
}

/**
 * Returns the corresponding code unit index to the code point index passed
 *
 * @param  {integer} codePointIndex  A code point index in the string
 * @return {integer}                 The corresponding code unit index
*/
String.prototype.kmwCodePointToCodeUnit = function(codePointIndex) {
  var str = String(this);
  
  var codeUnitIndex = 0;

  if(codePointIndex < 0) {
    codeUnitIndex = str.length;
    for(var i = 0; i > codePointIndex; i--, 
	  codeUnitIndex = str.kmwPrevChar(codeUnitIndex));	
    return codeUnitIndex;
  }

  for(var i = 0; i < codePointIndex; i++,
    codeUnitIndex = str.kmwNextChar(codeUnitIndex));
  return codeUnitIndex;
}

Updated 11 May 2012: Removed script formatting code as site hosting it was unreliable. Back to good old <pre> for now.
Updated 29 May 2014: Fixed broken script text, damaged when inserted previously.

Revisiting the VistaAltFixUnit.pas code

When Windows Vista was introduced, many moons ago now, many Delphi programs ran into a problem where buttons, checkboxes and the like would disappear when the user pressed the Alt key.  This was due to a change in how and when the underscores that displayed shortcuts on these buttons was displayed.

The problem was duly noted on Borland/CodeGear/Embarcadero’s QualityCentral forum, and several workarounds were documented.  The one that became most popular online seemed to be the one known as VistaAltFixUnit.pas.  Due to its popularity, we ended integrated that unit into our code, without spending a lot of time looking at it.  Surely many eyes had already reviewed it?

Now I don’t really want to be harsh but unfortunately there are a number of issues with the unit, ranging from performance to stability issues.  Here’s a rundown.

  1. The most egregious problem is the replacement of each TForm‘s WindowProc procedure.
    constructor TFormObj.Create(aForm: TForm; aRepaintAll: Boolean);
    begin
      inherited Create;
      Form := aForm;
      RepaintAll := aRepaintAll;
      Used := True;
      OrgProc := Form.WindowProc;
      Form.WindowProc := WndProc;
    end;
    
    procedure TFormObj.WndProc(var Message: TMessage);
    begin
      OrgProc(Message);
      if (Message.Msg = WM_UPDATEUISTATE) then
        NeedRepaint := True;
    end;

    Like any naively chained procedure hook, if another component or other code also attempts to replace the WindowProc, the two will end up interfering with each other at some point, as neither component will be aware of the other, and destruction of either component, or attempts to unhook, are likely to end in access violations and tears shortly thereafter.

    Now, we also make use of the TNT Unicode components (as we have not yet moved to a fully Unicode version of Delphi — that’s a big job!), and it turns out that these components also use this approach for some of their own jiggery-pokery.  The TNT Unicode components are more important to us than this one…  guess which component suite got to keep their hook?

  2. Note that, possibly due to instabilities that became evident, the TFormObj class does not attempt to restore the original WindowProc when it is destroyed.
  3. Performance-wise, this component does a lot of work checking form status at application idle time.  As other users noted, the TApplicationEvents approach only works if no other code in the project has already assigned TApplication.OnIdle.

    In the end, we rewrote the component from scratch, and used a Windows CallWndProc hook as this is clean, simple and robust.  We also opted to live with a very minor flicker when the user first presses Alt, as this reduced the complexity of the code significantly, relying on Windows’ existing repaint infrastructure instead of reimplementing it ourselves.

    Any thoughts on the changes?  What have we got wrong?  Feel free to use this code without restrictions, for what it is worth — at your own risk of course!

    unit VistaAltFixUnit2;
    
    interface
    
    uses
      Windows, Classes;
    
    type
      TVistaAltFix2 = class(TComponent)
      private
        FInstalled: Boolean;
        function VistaWithTheme: Boolean;
      public
        constructor Create(AOwner: TComponent); override;
        destructor Destroy; override;
      end;
    
    procedure Register;
    
    implementation
    
    uses
      Messages, Themes;
    
    procedure Register;
    begin
      RegisterComponents('MEP', [TVistaAltFix2]);
    end;
    
    var
      FInstallCount: Integer = 0;
      FCallWndProcHook: Cardinal = 0;
    
    { TVistaAltFix2 }
    
    function CallWndProcFunc(
      nCode: Integer;
      wParam: WPARAM;
      lParam: LPARAM): LRESULT; stdcall;
    var
      p: PCWPSTRUCT;
    begin
      if nCode = HC_ACTION then
      begin
        p := PCWPSTRUCT(lParam);
        if p.message = WM_UPDATEUISTATE then
        begin
          InvalidateRect(p.hwnd, nil, False);
        end;
      end;
      Result := CallNextHookEx(FCallWndProcHook, nCode, wParam, lParam);
    end;
    
    constructor TVistaAltFix2.Create(AOwner: TComponent);
    begin
      inherited;
      if VistaWithTheme and not (csDesigning in ComponentState) then
      begin
        Inc(FInstallCount); // Allow more than 1 instance, assume single threaded as VCL is not thread safe anyway
        if FInstallCount = 1 then
          FCallWndProcHook := SetWindowsHookEx(WH_CALLWNDPROC, CallWndProcFunc, 0, GetCurrentThreadID);
        FInstalled := True;
      end;
    end;
    
    destructor TVistaAltFix2.Destroy;
    begin
      if FInstalled then
      begin
        Dec(FInstallCount);
        if FInstallCount = 0 then
        begin
          UnhookWindowsHookEx(FCallWndProcHook);
          FCallWndProcHook := 0;
        end;
      end;
      inherited Destroy;
    end;
    
    function TVistaAltFix2.VistaWithTheme: Boolean;
    var
      OSVersionInfo: TOSVersionInfo;
    begin
      OSVersionInfo.dwOSVersionInfoSize := SizeOf(OSVersionInfo);
      if GetVersionEx(OSVersionInfo) and
         (OSVersionInfo.dwMajorVersion >= 6) and
         ThemeServices.ThemesEnabled then
        Result := True
      else
        Result := False;
    end;
    
    end.

How not to do exception classes in a Delphi library

The pngimage unit in Delphi includes a bunch of exception classes to handle error conditions when handling PNG files.  That’s great, except for one problem: all the classes inherit from Exception instead of an abstract EPNGException.  Of course it would be even better to inherit from a hierarchy of abstract exception classes.

Frankly I don’t care if the image is corrupt because it has an unknown critical chunk, or if it is corrupt because it has no image data.  All I care about is that my application cannot load an image and I need that condition handled gently.

Basically, not using a hierarchy of classes means that error handling code in applications loading PNG files must either be too broad and generic (on E:Exception), or too specific (on E:EPNGUnknownCriticalChunk).  Future changes to the PNG class library for additional exception types when loading a PNG file would require every developer to add additional exception checks to every location where PNG files are loaded in order to handle that error, when in nearly every case the developer really just wants to handle EInvalidGraphic!

Even worse, this means that generic code for loading images must be overloaded with knowledge about PNG exception types, or else risk masking more serious errors such as access violations with an overarching on E:Exception clause!

So here’s the current complete list that the developer must be aware of:

EPNGOutMemory = class(Exception);
EPngError = class(Exception);
EPngUnexpectedEnd = class(Exception);
EPngInvalidCRC = class(Exception);
EPngInvalidIHDR = class(Exception);
EPNGMissingMultipleIDAT = class(Exception);
EPNGZLIBError = class(Exception);
EPNGInvalidPalette = class(Exception);
EPNGInvalidFileHeader = class(Exception);
EPNGIHDRNotFirst = class(Exception);
EPNGNotExists = class(Exception);
EPNGSizeExceeds = class(Exception);
EPNGMissingPalette = class(Exception);
EPNGUnknownCriticalChunk = class(Exception);
EPNGUnknownCompression = class(Exception);
EPNGUnknownInterlace = class(Exception);
EPNGNoImageData = class(Exception);
EPNGCouldNotLoadResource = class(Exception);
EPNGCannotChangeTransparent = class(Exception);
EPNGHeaderNotPresent = class(Exception);
EPNGInvalidNewSize = class(Exception);
EPNGInvalidSpec = class(Exception);

I’m disappointed that when this code was acquired and integrated into the Delphi source code, that this level of basic tidy up was not done.

Using out parameters in Delphi SOAP transactions

So here’s today’s little gotcha: TRemotable out parameters in Delphi SOAP interface function calls must always be initialised before the call.  If you don’t do this, at some point your client application will crash with an access violation in TValue.Make (System.Rtti.pas).

It turns out that when the TRIO component parses the Run Time Type Information (RTTI), it conflates the var and out modifiers on the parameters, and treats out parameters as var parameters (see function TRttiMethod.GetInvokeInfo).  This means that when the function is called, the TValue.Make function which collects RTTI for the passed parameters will assume that a non-nil object reference is always pointing to a valid and initialised object, even for an out parameter, and will dereference it to get additional RTTI.

The workaround is simple: just initialise all out parameters to nil before the function call.  This is good practice anyway.

I think the proper fix in the Delphi library code would be to treat out and var parameters differently in the SOAP calls, and always clear out parameters to nil within the SOAP framework.

Here’s some example code to reproduce the problem:

unit TestOutParams;

interface

uses
  Winapi.Windows, Winapi.Messages, System.SysUtils, System.Variants, System.Classes, Vcl.Graphics,
  Vcl.Controls, Vcl.Forms, Vcl.Dialogs, Vcl.StdCtrls,
  Soap.SoapHTTPClient, Soap.InvokeRegistry;

type
  TForm1 = class(TForm)
    Button1: TButton;
    procedure Button1Click(Sender: TObject);
  end;

type
  ITestOutParam = interface(IInvokable)
    ['{343E9171-4300-4523-A926-4904EDD652E1}']
    function TestOutParam(out p5: TSOAPAttachment): Boolean; stdcall;
  end;

var
  Form1: TForm1;

implementation

{$R *.dfm}

function GetITestOutParam: ITestOutParam;
var
  RIO: THTTPRIO;
begin
  Result := nil;
  RIO := THTTPRIO.Create(nil);
  try
    Result := (RIO as ITestOutParam);
  finally
    if (Result = nil) then
      RIO.Free;
  end;
end;

procedure TForm1.Button1Click(Sender: TObject);
var
  FSOAP: ITestOutParam;
  FOutputParameter: TSOAPAttachment;
begin
  FOutputParameter := Pointer(8); // simulate uninitialized local variable pointing to protected non-nil address
  FSOAP := GetITestOutParam;
  Assert(Assigned(FSOAP));
  FSOAP.TestOutParam(FOutputParameter); // this will crash
end;

initialization
  InvRegistry.RegisterInterface(TypeInfo(ITestOutParam));
end.

WinDBG and Delphi exceptions

When debugging a Delphi XE2 app in WinDBG, NTSD or a related debugger, it is very helpful to be able to display the actual class name and error message from an exception in the debugger.  The following script will do that for you automatically:

sxe -c "da poi(poi(poi(ebp+1c))-38)+1 L16;du /c 100 poi(poi(ebp+1c)+4)" 0EEDFADE

The command can be broken down as follows:

  • sxe adds a filter for an exception event, enabled by default.  sxd would add the filter, disabled.
  • -c tells the debugger to run the quoted commands when the event occurs.
  • da displays an “ANSI” (non-Unicode) string from memory in the debugger
  • ebp+1c is a pointer to the exception record
  • poi(poi(ebp+1c))-38 points to the non-Unicode string returned by TObject.ClassName, internally known as Self.vmtClassName.
  • du displays a Unicode string from memory in the debugger.
  • poi(ebp+1c)+4 is a pointer to the Unicode string for Exception.Message.  As with all Delphi strings, you could go back 4 bytes to get the string length, but for now we limit the length to 100 characters with the /c parameter. 
  • 0EEDFADE is the exception code for a Delphi exception (another example is C0000005 for Access Violation).

These offsets are correct for Delphi XE2.  For Delphi 7, the relevant command is:

sxe -c "da poi(poi(poi(ebp+1c))-2c)+1 L16;da /c 100 poi(poi(ebp+1c)+4)" 0EEDFADE

Other versions of Delphi will have similar offsets, a little spelunking should yield the appropriate offsets pretty quickly!

Another partial malware diagnosis

A customer reported a problem with starting our application today.  The error reported by our application was strange and was not one we’d encountered before:

Exception 'Exception' in module _________.exe at 004904CB
Unable to hook API functions for print preview [-1,-1,-1,0,0,0,0]

In effect the error told us that 4 out of 7 API hooks failed.  I was called upon to try and diagnose the issue.

Initially I looked for a 3rd party application that could be hooking the calls in question (RegisterClassA, RegisterClassW, RegisterClassExA, RegisterClassExW).  But there were no unusual applications running according to Process Explorer, and no unexpected DLLs in memory in the process.  After disabling the antivirus in case that was causing the problem, and running both RootkitRevealer and Procmon with no clear outcomes, I decided I’d need to go deeper.

Time to break out windbg.  I started our process and looked at the disassembly for one of the functions that failed to hook.  Here’s what I saw:

0:000> u user32!registerclassexw
user32!RegisterClassExW:
7e41af7f 8bff            mov     edi,edi
7e41af81 55              push    ebp
7e41af82 8bec            mov     ebp,esp
7e41af84 8b4508          mov     eax,dword ptr [ebp+8]
7e41af87 833830          cmp     dword ptr [eax],30h
7e41af8a 0f850be70200    jne     user32!RegisterClassExW+0xd (7e44969b)
7e41af90 6800010000      push    100h
7e41af95 6a00            push    0

That’s pretty normal, the usual mov edi,edi that most Windows API calls start with, and what we were expecting.  So I continued execution until the error occurred, and took another look at that point.

0:000> u user32!registerclassexw
user32!RegisterClassExW:
7e41af7f e9cc93d281      jmp     00144350
7e41af84 8b4508          mov     eax,dword ptr [ebp+8]
7e41af87 833830          cmp     dword ptr [eax],30h
7e41af8a 0f850be70200    jne     user32!RegisterClassExW+0xd (7e44969b)
7e41af90 6800010000      push    100h
7e41af95 6a00            push    0
7e41af97 6a00            push    0
7e41af99 50              push    eax

Huh, that’s kinda different.  Now we were jumping off into a very unexpected part of memory.  A quick check of that address revealed that it was not mapped into the normal address space of any modules.  I had a look at the code in question.

0:000> u 144350 L...
00144350 55              push    ebp
00144351 8bec            mov     ebp,esp
00144353 83ec30          sub     esp,30h
00144356 f605e029150004  test    byte ptr ds:[1529E0h],4   ; 001529e0
0014435d 56              push    esi
0014435e 8b7508          mov     esi,dword ptr [ebp+8]
00144361 7433            je      00144396
00144363 e8595bffff      call    00139ec1
00144368 84c0            test    al,al
0014436a 742a            je      00144396
0014436c 85f6            test    esi,esi
0014436e 7426            je      00144396
00144370 833e30          cmp     dword ptr [esi],30h
00144373 7521            jne     00144396
00144375 8b4608          mov     eax,dword ptr [esi+8]
00144378 e8fdfeffff      call    0014427a
0014437d 85c0            test    eax,eax
0014437f 7415            je      00144396
00144381 6a30            push    30h
00144383 56              push    esi
00144384 8d4dd0          lea     ecx,[ebp-30h]
00144387 51              push    ecx
00144388 e818010000      call    001444a5
0014438d 8945d8          mov     dword ptr [ebp-28h],eax
00144390 8d45d0          lea     eax,[ebp-30h]
00144393 50              push    eax
00144394 eb01            jmp     00144397
00144396 56              push    esi
00144397 ff1568141300    call    dword ptr ds:[131468h]   ; -> 02200126
0014439d 5e              pop     esi
0014439e c9              leave
0014439f c20400          ret     4

A bit hard to know what it was doing but there was a call at the bottom there that was worth a quick look.

02200126 8bff            mov     edi,edi
02200128 55              push    ebp
02200129 8bec            mov     ebp,esp
0220012b e954ae217c      jmp     user32!RegisterClassExW+0x5 (7e41af84)

Yep, as expected it was a jump back to the original API function, 5 bytes in. That looked like a hook library was being used because the callback to the original function was in a separate memory block.  But no real info.  But again, looking at the address space revealed it belonged to no known module.

0:000> !address 2200126
    02200000 : 02200000 - 00001000
                    Type     00020000 MEM_PRIVATE
                    Protect  00000040 PAGE_EXECUTE_READWRITE
                    State    00001000 MEM_COMMIT
                    Usage    RegionUsageIsVAD

At this stage, it was clear we were looking at malware, so I decided to look for some strings in the data area referenced earlier (in blue, above).  Initially I found only strings pointing to Application Data and other uninteresting sources.

0:000> dd 1529e0
001529e0  00000000 02181ea0 0000001c 83f6f0a1
001529f0  00000000 00130000 7c800000 7c900000
00152a00  02200000 00000000 7c90d7fe 00000000
00152a10  0220000a 7c916a02 0000000c 00152a24
00152a20  00000000 00040001 00000000 00000000
00152a30  00000000 00000000 ffffffff 02181ee0
00152a40  003a0043 0044005c 0063006f 006d0075
00152a50  006e0065 00730074 00610020 0064006e

But eventually I struck gold:

0:000> dd
00152f5c  00000000 000006b0 00000000 004f0053
00152f6c  00540046 00410057 00450052 004d005c
00152f7c  00630069 006f0072 006f0073 00740066
00152f8c  0041005c 006b0067 00610065 00000064
00152f9c  00000000 00000000 00000000 00000000
00152fac  00000000 00000000 00000000 00000000
00152fbc  00000000 00000000 00000000 00000000
00152fcc  0019c110 ffffffff 00000000 00000000

This proved to be a suspicious registry key:

0:000> du 152f68
00152f68  "SOFTWARE\Microsoft\Agkead"

A quick glance at that registry key showed the following suspicious registry entries:

I picked up a few other interesting strings as well:

0:000> du 152fe8
00152fe8  "Global\{451EEC04-7C31-7A30-8C56-"
00153028  "BCE6C174342E}"
0:000> du 1527e0
001527e0  "Enfok"

The following string was also interesting:

0:000> du 1523d4
001523d4  "\Documents and Settings\Receptio"
00152414  "n_2.PGE\Application Data\Ewacg\o"
00152454  "xmo.hio"

While the folder existed, I was unable to see the file oxmo.hio.  This, as well as the fact that I could not see any user mode activity doing the hooking of the functions in question, really suggested a rootkit which was doing some cloaking, rather than simple user-mode malware.

A reference to the string Agkead was on ThreatExpert.

But by now I was really only continuing out of interest, so I handed the machine in question back to the client, with the advice that they rebuild it — difficult to be sure that the machine is clean any other way.  While it would have been fun to analyse the malware further, it’s not really my job 🙁

Workaround for the "AllowDocumentFunction constraint violated" error with Delphi XE2 apps

I have been reading discussions online about an EOleException error we were getting when calling TransformNode from a Delphi XE2 application: "Operation Aborted: AllowDocumentFunction constraint violated". The problem arises because Delphi XE2 now uses MSXML 6.0, which has made some changes to default settings for security reasons.

This, along with the ProhibitDTD property, has caused some grief. The recommended fix is to make changes to the VCL unit xml.win.msxmldom.pas. I found an alternative which appears to work without side-effects and requires no changes to the VCL source code: set the MSXMLDOMDocumentCreate to your own custom implementation which sets the AllowDocumentFunction (and ProhibitDTD or AllowXsltScript if you wanted) property after creating the object.

unit msxml_transformnode_fix;

interface

implementation

uses
  Winapi.ActiveX, Winapi.Windows, System.Variants, System.Win.ComObj, Winapi.msxml, Xml.xmldom, System.Classes,
  Xml.Win.msxmldom, Xml.XMLConst;

function TryObjectCreate(const GuidList: array of TGuid): IUnknown;
var
  I: Integer;
  Status: HResult;
begin
  Status := S_OK;
  for I := Low(GuidList) to High(GuidList) do
  begin
    Status := CoCreateInstance(GuidList[I], nil, CLSCTX_INPROC_SERVER or
      CLSCTX_LOCAL_SERVER, IDispatch, Result);
    if Status = S_OK then Exit;
  end;
  OleCheck(Status);
end;

function CreateDOMDocument: IXMLDOMDocument;
begin
  Result := TryObjectCreate([CLASS_DOMDocument60, CLASS_DOMDocument40, CLASS_DOMDocument30,
    CLASS_DOMDocument26, Winapi.msxml.CLASS_DOMDocument]) as IXMLDOMDocument;
  if not Assigned(Result) then
    raise DOMException.Create(SMSDOMNotInstalled);

  try
    (Result as IXMLDOMDocument2).SetProperty('AllowDocumentFunction', True);
  except on E: EOleError do
    ;
  end;
end;

initialization
  MSXMLDOMDocumentCreate := msxml_transformnode_fix.CreateDOMDocument;
end.

Delphi’s ongoing problem with "with"

Delphi has long included a scoping construct called with which can be used to increase the readability and efficiency of code.  From Delphi’s documentation:

A with statement is a shorthand for referencing the fields of a record or the fields, properties, and methods of an object. The syntax of a with statement is:
  with obj do statement
or:
  with obj1, …, objn do statement
where obj is an expression yielding a reference to a record, object instance, class instance, interface or class type (metaclass) instance, and statement is any simple or structured statement. Within the statement, you can refer to fields, properties, and methods of obj using their identifiers alone, that is, without qualifiers.

However, with has a really big gotcha: scope ambiguity.  Scope ambiguity came back to bite us yet again today: I am currently upgrading a major project (over 1 million SLoC) from an older version of Delphi to Delphi XE2. One component of this project is a legacy third party component package which is no longer supported.  We have full source and the cost of replacing it would be too high, so we are patching the source where needed.

While tracing a reported window sizing bug, I found the following line of code (on line 12880, yes it’s a big source file):

with vprGetTranspBorderSize(BorderStyle) do
  R := Rect(Left,Top,Width-Right,Height-Bottom);

vprGetTranspBorderSize returns a TRect.  In the earlier version of Delphi, the Width and Height properties were not members of TRect, so were found in the parent scope, being the component itself in this case.  All good.

But now in Delphi XE2, records can now have functions and properties just like a class.  So TRect has a bunch of additional properties, including Width and Height.  Handy to have, until you throw in some code that was written before these new properties existed.  One fix here is simply to qualify the scope:

with vprGetTranspBorderSize(BorderStyle) do
  R := Rect(Left,Top,Self.Width-Right,Self.Height-Bottom);

Or, you could declare a second variable, and eliminate the with statement entirely:

R2 := vprGetTranspBorderSize(BorderStyle);
R := Rect(R2.Left,R2.Top,Width-R2.Right,Height-R2.Bottom);

The problem with the second approach is it makes the code less readable and the introduction of the second variable R2 widens its scope to the whole function, rather than just the block where its needed.  This tends to lead to reuse of variables, which is a frequent source of bugs.

Many developers (including myself) have argued that adding support for aliases to with would avoid these scope problems.  Some others argue that one simply shouldn’t use with, and instead declare variables.  But with does make code much cleaner and easier to read, especially when used with try/finally blocks.

with R2 := vprGetTranspBorderSize(BorderStyle) do
  R := Rect(R2.Left,R2.Top,Width-R2.Right,Height-R2.Bottom);

Given that this issue was reported to Borland as far back as 2002, and since that time many, many complex language constructs have been added to Delphi, I think it’s a real crying shame that Borland/Inprise/CodeGear/Embarcadero have chosen to ignore this problem for so long.

Strava segment statistics site updates

I made some small tweaks to my Strava segment stats website recently. This includes:

  • Fixes for rendering issues in the lists
  • Ability to hide bogus segments, similar to the Strava flagging function
  • Dynamically sortable tables
  • Underpinnings for anyone to be able to have their stats updated automatically (but UI for this not yet complete)

Currently I also have an issue with the Strava API, where segments with more than 50 efforts cannot have all efforts populated. Once I have a resolution for this, I’ll publish another update with the ability for anyone to view their stats.

Loading a Unicode string from a file with Delphi functions

In my previous post, I described differences in saving text with TStringStream and TStringList.  TStringList helpfully adds a preamble.  TStringStream doesn’t.  Now when loading text from a stream, you’ll typically want to strip off the preamble.  But if you want the text to be otherwise unmodified, then TStringList is not safe, and TStringStream doesn’t strip off the preamble.

Here’s a helper function that does strip the preamble.  If you don’t pass an encoding, it will guess on the basis of the preamble (but won’t otherwise sniff the stream content to guess the encoding heuristically). If the content does not have a preamble, it assumes the current code page (TEncoding.Default).  If you do pass an encoding, the preamble will be stripped if it is there but no encoding detection will take place.

function LoadStringFromFile(const filename: string; encoding: TEncoding = nil): string;
var
  FPreambleLength: Integer;
begin
  with TBytesStream.Create do
  try
    LoadFromFile(filename);
    FPreambleLength := TEncoding.GetBufferEncoding(Bytes, encoding);
    Result := encoding.GetString(Bytes, FPreambleLength, Size - FPreambleLength);
  finally
    Free;
  end;
end;

Obviously not fantastic for very large files (you can solve that one yourself) but for your plain old bite sized files, quick and easy solution.