Loading a Unicode string from a file with Delphi functions

In my previous post, I described differences in saving text with TStringStream and TStringList.  TStringList helpfully adds a preamble.  TStringStream doesn’t.  Now when loading text from a stream, you’ll typically want to strip off the preamble.  But if you want the text to be otherwise unmodified, then TStringList is not safe, and TStringStream doesn’t strip off the preamble.

Here’s a helper function that does strip the preamble.  If you don’t pass an encoding, it will guess on the basis of the preamble (but won’t otherwise sniff the stream content to guess the encoding heuristically). If the content does not have a preamble, it assumes the current code page (TEncoding.Default).  If you do pass an encoding, the preamble will be stripped if it is there but no encoding detection will take place.

function LoadStringFromFile(const filename: string; encoding: TEncoding = nil): string;
  FPreambleLength: Integer;
  with TBytesStream.Create do
    FPreambleLength := TEncoding.GetBufferEncoding(Bytes, encoding);
    Result := encoding.GetString(Bytes, FPreambleLength, Size - FPreambleLength);

Obviously not fantastic for very large files (you can solve that one yourself) but for your plain old bite sized files, quick and easy solution.

2 thoughts on “Loading a Unicode string from a file with Delphi functions

  1. This helped me a lot. I recently revamped my unicode helper functions and I was not aware that I needed to strip the preamble chars. In my previous code I used some built-in wide string functions that (as it turns out) were doing all that for me. Thanks for the example code!

    1. You’re welcome 🙂 Glad the code is helpful; the differences in these classes is not super obvious and it’s tripped me up more than once.

Leave a Reply

Your email address will not be published. Required fields are marked *