IXMLDocument.SaveToStream does not always use UTF-16 encoding

Delphi’s documentation on IXMLDocument.SaveToStream has the following important caveat:

Regardless of the encoding system of the original XML document, SaveToStream always saves the stream in UTF-16.

It’s helpful to have notes like this.  Mind you, forcing UTF-16 output is definitely horrible; what if we need our document in UTF-8 or (God forbid) some non-Unicode encoding?

Now Kris and I were looking at a Unicode corruption issue with an XML document in a Delphi application and struggling to understand what was going wrong given this statement in the documentation.  Our results didn’t add up, so we wrote a little test app to test that statement:

procedure TForm1.OutputEncodingIsUTF8;
const
  UTF8XMLDoc: string =
    '<!--?xml version="1.0" encoding="utf-8"?-->'#13#10+
    '';
var
  XMLDocument: IXMLDocument;
  InStream: TStringStream;
  OutStream: TFileStream;
begin
  // stream format is UTF8, input string is converted to UTF8
  // and saved to the stream
  InStream := TStringStream.Create(UTF8XMLDoc, TEncoding.UTF8);</blockquote>
  // we'll write to this output file
  OutStream := TFileStream.Create('file_should_be_utf16_but_is_utf8.xml',
    fmCreate);
  try
    XMLDocument := TXMLDocument.Create(nil);
    XMLDocument.LoadFromStream(InStream);
    XMLDocument.SaveToStream(OutStream);
    // IXMLDocument.SaveToStream docs state will always be UTF-16
  finally
    FreeAndNil(InStream);
    FreeAndNil(OutStream);
  end;

  with TStringList.Create do
  try    // we want to load it as a UTF16 doc given the documentation
    LoadFromFile('file_should_be_utf16_but_is_utf8.xml', TEncoding.Unicode);
    ShowMessage('This should be displayed as an XML document '+
                'but instead is corrupted: '+#13#10+Text);
  finally
    Free;
  end;
end;

When I run this, I’m expecting the following dialog:

But instead I get the following dialog:

Note, this is run on Delphi 2010.  Haven’t run this test on Delphi XE2, but the documentation hasn’t changed.

The moral of the story is, the output encoding is the same as the input encoding, unless you change the output encoding with the Encoding property, for example, adding the highlighted line below fixes the code sample:

    XMLDocument := TXMLDocument.Create(nil);
    XMLDocument.LoadFromStream(InStream);
    XMLDocument.Encoding := 'UTF-16';
    XMLDocument.SaveToStream(OutStream);

The same documentation issue exists for TXMLDocument.SaveToStream.  I’ve reported the issue in QualityCentral.

Leave a Reply

Your email address will not be published. Required fields are marked *