Some Delphi types do not have RTTI. This is no fun. This happens when, and I quote:
whereas enumerated constants with a specific value, such as the following, do not have RTTI: type SomeEnum = (e1 = 1, e2 = 2, e3 = 3);
In normal use, this will go unnoticed, and not cause you any grief, until you throw these enumerated types into a generic construct (or have any other need to use RTTI). As soon as you do that, you’ll start getting the unhelpful and misleading “Invalid Class Typecast” exception. (No it’s not a Class!)
To avoid this problem, you must wander into the dark world of pointer casting, because once you are pointing at some data, Delphi no longer cares what its actual type is.
Here’s an example of how to convert a Variant value into a generic type, including support for RTTI-free enums, in a reasonably type-safe way. This is part of a TNullable record type, which mimics, in some ways, the .NET Nullable type. The workings of this type are not all that important for the example, however. This example works with RTTI types, and with one byte non-RTTI enumerated types &mdash you’d need to extend it to support larger enumerated types. While I could reduce the number of steps in the edge case by spelunking directly into the Variant TVarData, that would not serve to clarify the murk.
constructor TNullable<T>.Create(AValue: Variant);
type
PT = ^T;
var
v: Byte;
begin
if VarIsEmpty(AValue) or VarIsNull(AValue) then
Clear
else if (TypeInfo(T) = nil) and
(SizeOf(T) = 1) and
(VarType(AValue) = varByte) then
begin
{ Assuming an enum type without typeinfo, have to
do some cruel pointer magics here to avoid type
cast errors, so am very careful to validate
first! }
v := AValue;
FValue := PT(@v)^;
end
else
Create(TValue.FromVariant(AValue).AsType<T>);
end;
So what is going on here? Well, first if we are passed Null or “Empty” variant values, then we just clear our TNullable value.
Otherwise we test if (a) we have no RTTI for our generic, and (b) it’s one byte in size, and (c) our variant is also a Byte value. If all these prerequisites are met, we perform the casting, in which we hark back to the ancient incantations with a pointer typecast, taking the address of the value and dereferencing it, fooling the compiler along the way. (Ha ha!)
Finally, we find a modern TValue incantation suffices to wreak the type change for civilised types such as Integer or String.
Sometimes it can be handy to test for design-time in a component unit when the component package is first loaded, e.g. within an initialization section, rather than when a component is created or registered. We use this to validate that runtime units that interoperate with a component are linked into a project, and raise an error as early as possible if they are not.
With Delphi’s RTTI, this is fairly straightforward, I believe:
function IsDesignTime: Boolean;
begin
Result := TRttiContext.Create.FindType('ToolsAPI.IBorlandIDEServices') <> nil;
end;
Using WinDbg to debug Delphi processes can be both frustrating and rewarding. Frustrating, because even with the tools available to convert Delphi’s native .TDS symbol file format into .DBG or .PDB, we currently only get partial symbol information. But rewarding when you persist, because even though it may seem obscure and borderline irrational, once you get a handle on the way objects and Run Time Type Information (RTTI) are implemented with Delphi, you can accomplish a lot, quite easily.
For the post today, I’ve created a simple Delphi application which we will investigate in a couple of ways. If you want to follow along, you’ll need to build the application and convert the debug symbols generated by Delphi to .DBG format with map2dbg or tds2dbg. I’ll leave the finer details of that to you — it’s not very complicated. Actually, to save effort, I’ve uploaded both the source, and the debug symbols + dump + executable (24MB zip).
I’ve made reference to a few Delphi internal constants in this post. These are defined in System.pas, and I’m using the constants as defined for Delphi XE2. The values may be different in other versions of Delphi.
In the simple Delphi application, SpelunkSample, I will be debugging a simulated crash. You can choose to either attach WinDbg to the process while it is running, or to create a crash dump file using a tool such as procdump.exe and then working with the dump file. If you do choose to create a dump file, you should capture the full process memory dump, not just stack and thread information (use -ma flag with procdump.exe).
I’ll use procdump.exe. First, I use tds2dbg.exe to convert the symbols into a format that WinDbg groks:
Then I just fire up the SpelunkSample process and click the “Do Something” button.
Next, I use procdump to capture a dump of the process as it stands. This generates a rather large file, given that this is not much more than a “Hello World” application, but don’t stress, we are not going to be reading the whole dump file in hex (only parts of it).
Time to load the dump file up in Windbg.
I want to understand what is going wrong with the process (actually, nothing is going wrong, but bear with me). I figure it’s important to know which forms are currently instantiated. This is conceptually easy enough to do: Delphi provides the TScreen class, which is instantiated as a global singleton accessible via the Screen variable in Vcl.Forms.pas. If we load this up, we can see a member variable FForms: TList, which contains references to all the forms “on the screen”.
But how to find this object in a 60 megabyte dump file? In fact, there are two good methods: use Delphi’s RTTI and track back; and use the global screen variable and track forward. I’ll examine them both, because they both come in handy in different situations.
Finding objects using Delphi’s RTTI
Using Delphi’s Run Time Type Information (RTTI), we can find the name of the class in memory and then track back from that. This information is in the process image, which is mapped into memory at a specific address (by default, 00400000 for Delphi apps, although you can change this in Linker options). So let’s find out where this is mapped:
Now we can search this memory for a specific ASCII string, the class name TScreen. When searching through memory, it’s important to be aware that this is just raw memory. So false positives are not uncommon. If you are unlucky, then the data you are searching for could be repeated many times through the dump, making this task virtually impossible. In practice, however, I’ve found that this rarely happens.
With that in mind, let’s do using the s -a command:
Whoa, that’s a lot of data. Looking at the results though, there are two distinct ranges of memory: 004F#### and 00A#####. Those in the 00A##### range are actually Delphi’s native debug symbols, mapped into memory. So I can ignore those. To keep myself sane, and make the debug console easier to review, I’ll rerun the search for a smaller range:
0:000> s -a 0400000 00a80000 "TScreen"
004f8f81 54 53 63 72 65 65 6e 36-00 90 5b 50 00 06 43 72 TScreen6..[P..Cr
004f9302 54 53 63 72 65 65 6e e4-8b 4f 00 f8 06 44 00 02 TScreen..O...D..
Now, these two references are close together, and I will tell you that the first one is the one we want. Generally speaking, the first one is in the class metadata, and the second one is not important today. Now that we have that "TScreen" string found in memory, we need to go back 1 byte. Why? Because "TScreen" is a Delphi ShortString, which is a string up to 255 bytes long, implemented as a length:byte followed by data (ANSI chars). And then we search for a pointer to that memory location with the s -d command:
Only one reference, nearby in memory, which is expected — the class metadata is generally stored nearby the class implementation. Now this is where it gets a little brain-bending. This pointer is stored in Delphi’s class metadata, as I said. But most this metadata is actually stored in memory before the class itself. Looking at System.pas, in Delphi XE2 we have the following metadata for x86:
Ignore that deprecated noise — it’s the constants that we want to know about. So the vmtClassName is at offset -56 (-38 hex). In other words, to find the class itself, we need to look 56 bytes ahead of the address of that pointer that we just found. That is, 004f8bac + 38h = 004f8be4. Now, if I use the dds (display words and symbols) command, we can see pointers to the implementation of each of the class’s member functions:
Huh. That’s interesting, but it’s a sidetrack; we can see TScreen.Create which suggests we are looking at the right thing. There’s a whole lot more buried in there but it’s not for this post. Let’s go back to where we were.
How do we take that class address and find instances of the class? I’m sure you can see where we are going. But here’s where things change slightly: we are looking in allocated memory now, not just the process image. So our search has to broaden. Rather than go into the complexities of memory allocation, I’m going to go brute force and look across a much larger range of memory, using the L? search parameter (which allows us to search more than 256MB of data at once):
Only two references. Why two and not one, given that we know that TScreen is a singleton? Well, because Delphi helpfully defines a vmtSelf metadata member, at offset -88 (and if we do the math, we see that 004f8be4 - 004f8b8c = 58h = 88d). So let’s look at the second one. That’s our TScreen instance in memory.
In this case, there was only one instance. But you can sometimes pickup objects that have been freed but where the memory has not been reused. There’s no hard and fast way (that I am aware of) of identifying these cases — but using the second method of finding a Delphi object, described below, can help to differentiate.
I’ll come back to how we use this object memory shortly. But first, here’s another way of getting to the same address.
Finding a Delphi object by variable or reference
As we don’t have full debug symbol information at this time, it can be difficult to find variables in memory. For global variables, however, we know that the location is fixed at compile time, and so we can use the disassembler in WinDbg to locate the address relatively simply. First, look in the source for a reference to the Screen global variable. I’ve found it in the FindGlobalComponent function (ironically, that function is doing programatically what we are doing via the long and labourious manual method):
function FindGlobalComponent(const Name: string): TComponent;
var
I: Integer;
begin
for I := 0 to Screen.FormCount - 1 do
begin
...
So, disassemble the first few lines of the function. Depending on the conversion tool you used, the symbol format may vary (x spelunksample!*substring* can help in finding symbols).
The highlighted address there corresponds to the Screen variable. The initialization+0xb1ac rubbish suggests missing symbol information, because (a) it doesn’t make much sense to be pointing to the “initialization” code, and (b) the offset is so large. And in fact, that is the case, we don’t have symbols for global variables at this time (one day).
But because we know this, we also know that 00524300 is the address of the Screen variable. The variable, which is a pointer, not the object itself! But because it’s a pointer, it’s easy to get to what it’s pointing to!
0:000> dd 00524300 L1
00524300 0247b370
Look familiar? Yep, it’s the same address as we found the RTTI way, and somewhat more quickly too. But now on to finding the list of forms!
Examining object members
Let’s dump that TScreen instance out and annotate its members. The symbols below I’ve manually added to the data, by looking at the implementation of TComponent and TScreen. I’ve also deleted some misleading annotations that Windbg added.
How did I map that? It’s not that hard — just look at the class definitions in the Delphi source. You do need to watch out for two things: packing, and padding. x86 processors expect variables to be aligned on a boundary of their size, so a 4 byte DWORD will be aligned on a 4 byte boundary. Conversely, a boolean only takes a byte of memory, and multiple booleans can be packed into a single DWORD. Delphi does not do any ‘intelligent’ reordering of object members (which makes life a lot simpler), so this means we can just map pretty much one-to-one. The TComponent object has the following member variables (TPersistent and TObject don’t have any member variables):
Let’s look at 02489da8, the FForms TList object. The first member variable of TList is FList: TPointerList. Knowing what we do about the object structure, we can:
It can be helpful to do a sanity check here and make sure that we haven’t gone down the wrong rabbit hole. Let’s check that this is actually a TList (poi deferences a pointer, but you should be able to figure the rest out given the discussion above):
0:000> da poi(004369e8-38)+1
00436b19 "TList'"
And yes, it is a TList, so we haven’t dereferenced the wrong pointer. All too easy to do in the dark cave that is assembly-language debugging. Back to the lead. We can see from the definition of TList:
Yes, it’s our form! But what is with that poi poi poi? Well, I could have dug down each layer one step at a time, but this is a shortcut, in one swell foop dereferencing the variable, first to the object, then dereferencing to the class, then back 38h bytes and dereferencing to the class name, and plus one byte for that ShortString hiccup. Saves time, and once familiar you can turn it into a WinDbg macro. But it’s helpful to be familiar with the structure first!
Your challenge
Your challenge now is to list each of the TMyObject instances currently allocated. I’ve added a little spice: one of them has been freed but some of the data may still be in the dump. So you may find it is not enough to just use RTTI to find the data — recall that the search may find false positives and freed instances. You should find that searching for RTTI and also disassembling functions that refer to member variables in the form are useful. Good luck!
Hint: If you are struggling to find member variable offsets to find the list, the following three lines of code from FormCreate may help (edx ends up pointing to the form instance):
As per severalQCreports, Data.DBXJSON.TJSONString.ToString is still very broken. Which means, for all intents and purposes, TJSONAnything.ToString is also broken. Fortunately, you can just use TJSONAnything.ToBytes for a happy JSON outcome.
The following function will take any Delphi JSON object and convert it to a string:
function JSONToString(obj: TJSONAncestor): string;
var
bytes: TBytes;
len: Integer;
begin
SetLength(bytes, obj.EstimatedByteSize);
len := obj.ToBytes(bytes, 0);
Result := TEncoding.ANSI.GetString(bytes, 0, len);
end;
Because TJSONString.ToBytes escapes all characters outside U+0020-U+007F, we can assume that the end result is 7-bit clean, so we can use TEncoding.ANSI. You could instead stream the TBytes to a file or do other groovy things with it.
Today I’ve got a process on my machine that is supposed to be exiting, but it has hung. Let’s load it up in Windbg and find what’s up. The program in question was built in Delphi XE2, and symbols were generated by our internal tds2dbg tool (but there are other tools online which create similar .dbg files). As usual, I am writing this up for my own benefit as much as anyone else’s, but if I put it on my blog, it forces me to put in enough detail that even I can understand it when I come back to it!
Looking at the main thread, we can see unit finalizations are currently being called, but the specific unit finalization section and functions which are being called are not immediately visible in the call stack, between InterlockedCompareExchange and FinalizeUnits:
So, the simplest way to find out where we were was to step out of the InterlockedCompareExchange call. I found myself in System.SysUtils.DoneMonitorSupport (specifically, the CleanEventList subprocedure):
0:000> p
eax=01a8ee70 ebx=01a8ee70 ecx=01a8ee70 edx=00000001 esi=00000020 edi=01a26e80
eip=0042dcb1 esp=0018ff20 ebp=0018ff3c iopl=0 nv up ei pl nz na po nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00200202
audit4_patient!CleanEventList+0xd:
0042dcb1 33c9 xor ecx,ecx
After a little more spelunking, and a review of the Delphi source around this function, I found that this was a part of the System.TMonitor support. Specifically, there was a locked TMonitor somewhere that had not been destroyed. I stepped through a loop that was spinning, waiting for the object to be unlocked so its handle could be destroyed, and found a reference to the data in question here:
0:000> p
eax=00000001 ebx=01a8ee70 ecx=01a8ee70 edx=00000001 esi=00000020 edi=01a26e80
eip=0042dcaf esp=0018ff20 ebp=0018ff3c iopl=0 nv up ei pl nz na po nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00200202
audit4_patient!CleanEventList+0xb:
0042dcaf 8bc3 mov eax,ebx
Looking at the record pointed to by ebx, we had a reference to an event handle handy:
Although Event is a Pointer, internally it’s just cast from an event handle. So I guess that we can probably find another reference to that handle somewhere in memory, corresponding to a TMonitor record:
Now one of these should correspond to a TMonitor record. The first entry (01a8ee74) is just part of our TSyncEventItem record, and the next three don’t make sense given that the FSpinCount (the next value in the memory dump) would be invalid. So let’s look at the last one. Counting quickly on all my fingers and toes, I establish that that makes 08a47538 the start of the TMonitor record. And… so we search for a pointer to that.
Just one! But here it gets a little tricky, because the PMonitor pointer is in a ‘hidden’ field at the end of the object. So we need to locate the start of the object.
I’m just stabbing in the dark here, but that 004015c8 that’s just four bytes back smells suspiciously like an object class pointer. Let’s see:
0:000> da poi(4015c8-38)+1
004016d7 "TObject&"
Ta da! That all fits. A TObject has no data members, so the next 4 bytes should be the TMonitor (search for hfMonitorOffset in the Delphi source to learn more). So we have a TObject being used as a TMonitor lock reference. (Learn about that poi(address-38)+1 magic). But what other naughty object is hanging about, using this TObject as its lock?
TThreadList = class
private
FList: TList;
FLock: TObject;
FDuplicates: TDuplicates;
Yes, that definitely looks hopeful! That FLock is pointing to our lock TObject… I believe that’s called a Quality Match.
This is still a little bit too generic for me, though. TThreadList is a standard Delphi class used by the bucketload. Let’s try and identify who is using this list and leaving it lying about. First, we’ll quickly have a look at that TThreadList.FList to see if it has anything of interest — that’s the first data member in the object == object+4.
Yep, it’s a TList. Just making sure. It’s empty, what a shame (TList.FCount is the second data member in the object == 00000000, as is the list pointer itself).
So how else can we find the usage of that TThreadList? Is it TThreadList referenced anywhere then? Break out the search tool again!
0:000> !handle 91c
Could not duplicate handle 91c, error 6
That suggests that the object has already been destroyed. But that the TThreadList hasn’t.
And sure enough, when I looked at the destructor for TAnatomyDiagramTileLoadThread, we clear the TThreadList, but we never free it!
Now, another way we could have caught this was to turn on leak detection. But leak detection is not always perfect, especially when you get some libraries that *cough* have a lot of false positives. And of course while we could have switched on heap leak detection, that involves rebuilding and restarting the process and losing the context along the way, with no guarantee we’ll be able to reproduce it again!
While this approach does feel a little tedious, and we did have some luck in this instance with freed objects not being overwritten, and the values we were searching for being relatively unique, it does nevertheless feel pretty deterministic, which must be better than the old “try-it-and-hope” debugging technique.
We saw situations where a letter would be printed with random letters missing, as per the following example:
Here’s how it should have been printed (oh yes, that’s a completely fictional name we used for testing non-English Latin script letters in printing):
Today, I was notified that Microsoft have finally publicly published a hotfix! We received the hotfix from them a few weeks ago, before it was publicly available, and it certainly solved the problem on our test machines and end user computers.
(The first link notes that the hotfix is included in the rollup in the second link, though the second link doesn’t mention the hotfix!)
Sadly, they’ve only published a solution for Windows 7 Service Pack 1 onwards, and not for Vista, which could be an issue for some users.
It was interesting to see that the issue was indeed a race condition (two threads ending up with the same random seed, because, and I quote:
This issue occurs because a conflict causes the text that uses the font to become corrupted when the two threads try to install the font at the same time. The name of the font is generated by the RAND function together with a SEEDvalue whose time value is set to srand(time(NULL)). When the two documents are printed at the same time, the SEEDvalue for the font is the same in both documents. Therefore, the conflict occurs.
So not related to the threading model, which was a little bit of misdirection on my part 😉 That’s par for the course though when debugging complex issues without source…
When refactoring a large monolithic executable written in Delphi into several executables, we ran into an unanticipated issue with what I am calling the fragile abstract factory (anti) pattern, although the name is possibly not a perfect fit. To get started, have a look at the following small program that illustrates the issue.
program FragileFactory;
uses
MyClass in 'MyClass.pas',
GreenClass in 'GreenClass.pas',
//RedClass in 'RedClass.pas',
SomeProgram in 'SomeProgram.pas';
var
C: TMyClass;
begin
C := TMyClass.GetObject('TGreenClass');
writeln(C.ToString);
C.Free;
C := TMyClass.GetObject('TRedClass'); // oh dear… that’s going to fail
writeln(C.ToString);
C.Free;
end.
unit MyClass;
interface
type
TMyClass = class
protected
class procedure Register;
public
class function GetObject(ClassName: string): TMyClass;
end;
implementation
uses
System.Contnrs,
System.SysUtils;
var
FMyClasses: TClassList = nil;
{ TMyObjectBase }
class procedure TMyClass.Register;
begin
if not Assigned(FMyClasses) then
FMyClasses := TClassList.Create;
FMyClasses.Add(Self);
end;
class function TMyClass.GetObject(ClassName: string): TMyClass;
var
i: Integer;
begin
for i := 0 to FMyClasses.Count – 1 do
if FMyClasses[i].ClassNameIs(ClassName) then
begin
Result := FMyClasses[i].Create;
Exit;
end;
Result := nil;
end;
initialization
finalization
FreeAndNil(FMyClasses);
end.
unit GreenClass;
interface
uses
MyClass;
type
TGreenClass = class(TMyClass)
public
function ToString: string; override;
end;
implementation
{ TGreenClass }
function TGreenClass.ToString: string;
begin
Result := 'I am Green';
end;
initialization
TGreenClass.Register;
end.
What happens when we run this?
C:\src\fragilefactory>Win32\Debug\FragileFactory.exe
I am Green
Exception EAccessViolation in module FragileFactory.exe at 000495A4.
Access violation at address 004495A4 in module ‘FragileFactory.exe’. Read of address 00000000.
Note the missing TRedClass in the source. We don’t discover until runtime that this class is missing. In a project of this scale, it is pretty obvious that we haven’t linked in the relevant unit, but once you get a large project (think hundreds or even thousands of units), it simply isn’t possible to manually validate that the classes you need are going to be there.
There are two problems with this fragile factory design pattern:
Delphi use clauses are fragile (uh, hence the name of the pattern). The development environment frequently updates them, typically they are not organised, sorted, or even formatted neatly. This makes validating changes to them difficult and error-prone. Merging changes in version control systems is a frequent cause of errors.
When starting a new Delphi project that utilises your class libraries, ensuring you use all the required units is a hard problem to solve.
Typically this pattern will be used with in a couple of ways, somewhat less naively than the example above:
There will be an external source for the identifiers. In the project in question, these class names were retrieved from a database, or from linked resources.
The registered classes will be iterated over and each called in turn to perform some function.
Of course, this is not a problem restricted to Delphi or even this pattern. Within Delphi, any unit that does work in its initialization section is prone to this problem. More broadly, any dynamically linking registry, such as COM, will have similar problems. The big gotcha with Delphi is that the problem can only be resolved by rebuilding the project, which necessitates rollout of an updated executable — much harder than just re-registering a COM object on a client site for example.
How then do we solve this problem? Well, I have not identified a simple, good clean fix. If you have one, please tell me! But here are a few things that can help.
Where possible, use a reference to the class itself, such as by calling the class’s ClassName function, to enforce linking the identified class in. For example:
C := TMyClass.GetObject(TGreenClass.ClassName);
C := TMyClass.GetObject(TRedClass.ClassName);
When the identifiers are pulled from an external resource, such as a database, you have no static reference to the class. In this case, consider building a registry of units, automatically generated during your build if possible. For example, we automatically generate a registry during build that looks like this:
unit MyClassRegistry;
// Do not modify; automatically generated
initialization
procedure AssertMyClassRegistry;
implementation
uses
GreenClass,
RedClass;
procedure AssertMyClassRegistry;
begin
Assert(TGreenClass.InheritsFrom(TMyClass));
Assert(TRedClass.InheritsFrom(TMyClass));
end;
end.
These are not assertions that are likely to fail but they do serve to ensure that the classes are linked in. The AssertMyClassRegistry function is called in the constructor of our main form, which is safer than relying on a use clause to link it in.
Units that can cause this problem can be identified by searching for units with initialization sections in your project (don’t forget units that also use the old style begin instead of initialization — a helpful grep regular expression for finding these units is (?s)begin(?:.(?!end;))+\bend\.). This at least gives you a starting point for making sure you test every possible case. Static analysis tools are very helpful here.
Even though it goes against every encapsulation design principle I’ve ever learned, referencing the subclass units in the base class unit is perhaps a sensible solution with a minimal cost. We’ve used this approach in a number of places.
Format your use clauses, even sorting them if possible, with a single unit reference on each line. This is a massive boon for source control. We went as far as building a tool to clean our use clauses, and found that this helped greatly.
In summary, then, the fragile abstract factory (anti) pattern is just something to be aware of when working on large Delphi projects, mainly because it is hard to test for: the absence of a unit only comes to light when you actually call upon that unit, and due to the fragility of Delphi’s use clauses, unrelated changes are likely to trigger the issue.
I was debugging a weird cascade of exceptions in an application today — it started with an EOleException from a database connection issue, and rapidly degenerated into a series of EAccessViolation and EInvalidPointer exceptions: often a good sign of Use-After-Free or Double-Free scenarios. Problem was, I could not see any place where we could be using a object after freeing it, even in the error case.
Here’s a shortened version of the code:
procedure ConnectToDatabase;
begin
try
CauseADatabaseException; // yeah... bear with me here.
except
on E: EDatabaseError do
begin
E.Message := 'Failed to connect to database; the '+
'error message was: ' + E.Message;
raise(E);
end;
end;
end;
Can you spot the bug?
I must admit I read through the code quite a few times before I spotted it. It’s not an in-your-face-look-at-me type of bug! In fact, the line in question appears to be completely logical and plausible.
Okay, enough waffle. The bug is in the last line of the exception handler:
raise(E);
When we call raise(E) we are telling Delphi that here is a shiny brand new exception object that we want to raise. After Delphi raises the exception object, the original exception object is freed, and … wait a minute, that’s the exception object we were raising! One delicious serve of Use-After-Free or Double-Free coming right up!
We should be doing this instead:
raise; // don't reference E here!
Another thing to take away from this is: remember that you don’t control the lifetime of the exception object. Don’t store references to it and expect it to survive. If you want to maintain knowledge of an inner exception object, use Delphi’s own nested exception management, available since Delphi 2009.
Have you ever written any Delphi code like the following snippet? That is, have you ever raised a Delphi exception in a Win32 callback? Or even just failed to handle potential exceptions in a callback?
function MyWndEnumFunc(hwnd: HWND; lParam: LPARAM): BOOL; stdcall;
begin
if hwnd = TForm4(lParam).Handle then
raise Exception.Create('Raise an exception just to demonstrate the issue');
Result := True;
end;
procedure TForm4.FormCreate(Sender: TObject);
begin
try
EnumWindows(@MyWndEnumFunc, NativeInt(Self));
except
on E:Exception do
ShowMessage('Well, we unwound that exception. But does Win32 agree?');
end;
end;
Raymond Chen has posted a great blog today about why that approach will eventually end in tears. You may get away with it for a while, or you may end up with horrific stack or heap corruption. The moral of the story is, any time you have a Win32 callback function, you need to make sure no exceptions leak. Like this:
function MyWndEnumFunc(hwnd: HWND; lParam: LPARAM): BOOL; stdcall;
begin
try
if hwnd = TForm4(lParam).Handle then
raise Exception.Create('Raise an exception just to demonstrate the issue');
Result := True;
except
// Handle the exception, perhaps pass it to Application.HandleException,
// or log it, or abort your app. Just don’t let it bubble through any
// Win32 code. You need to write the HandleAllExceptions function!
HandleAllExceptions;
Result := False;
end;
end;
If you use Delphi’s AllocateHwnd procedure, remember that it also does not handle exceptions for you (I’ve just reported this in QualityCentral as QC108653 as this caveat should at least be documented). So you need to do it:
procedure TForm4.MyAllocatedWindowProc(var Message: TMessage);
begin
try
if Message.Msg = WM_USER then
raise Exception.Create('Go wild!');
except
Application.HandleException(Self); // Or whatever, just don’t let it leak
end;
with Message do Result := DefWindowProc(FMyHandle, Msg, wParam, lParam);
end;
procedure TForm4.FormCreate(Sender: TObject);
begin
FMyHandle := AllocateHwnd(MyAllocatedWindowProc);
SendMessage(FMyHandle, WM_USER, 0, 0);
end;
While I thought, yesterday, that this bug was no longer such a problem on Windows Vista and Windows 7, it turns out that if you assign a hotkey to a language (e.g. Alt+Shift+2), the KLF_SETFORPROCESS flag will be set when that hotkey is pressed. This means that this issue is still a problem today.
Moral of the story (again): thread-safe code is hard. Don’t believe that your code suddenly becomes thread-safe because you synchronize it to the main thread (no matter what the Delphi documentation tells you!)