Frequently in Delphi we come across the need to encode a string to stuff into a URL query string parameter (as per web forms). One would expect that Indy contains well-tested functions to handle this. Well, Indy contains some functions to help with this, but they may not work quite as you expect. In fact, they may not be much use at all.
Indy contains a component called TIdURI. It contains, among other things, the member functions URLEncode, PathEncode, and ParamsEncode. At first glance, these seem to do what you would need. But in fact, they don’t.
URLEncode will take a full URL, split it into path, document and query components, encode each of those, and return the full string. PathEncode is intended to handle the nuances of the path and document components of the URL, and ParamsEncode handles query strings.
Sounds great, right? Well, it works until you have a query parameter that has an ampersand (&) in it. Say my beloved end user want to search for big&little. It seems that you could pass the following in:
s := TIdURI.URLEncode('http://www.google.com/search?q='+SearchText);
But then we get no change in our result:
s = 'http://www.google.com/search?q=big&little';
And you can already see the problem: little is now a separate parameter in the query string. How can we work around this? Can we pre-encode ampersand to %26 before you pass in the parameters?
s := TIdURI.URLEncode('http://www.google.com/search?q='+ReplaceStr(SearchText, '&', '%26'));
No:
s = 'http://www.google.com/search?q=big%25%26little';
And obviously we can’t do it ourselves afterwards, because we too won’t know which ampersands are which. You could do correction of ampersand by encoding each parameter component separately and then post-processing the component for ampersand and other characters before final assembly using ParamsEncode. But you’ll soon find that it’s not enough anyway. =, / and ? are also not encoded, although they should be. Finally, URLEncode does not support internationalized domain names (IDN).
Given that these functions are not a complete solution, it’s probably best to avoid them altogether.
The problem is analogous to the Javascript encodeURI vs encodeURIComponent issue.
So to write your own… I haven’t found a good Delphi solution online (and I searched a bit), so here’s a function I’ve cobbled together (use at your own risk!) to encode parameter names and values. You do need to encode each component of the parameter string separately, of course.
function EncodeURIComponent(const ASrc: string): UTF8String;
const
HexMap: UTF8String = '0123456789ABCDEF';
function IsSafeChar(ch: Integer): Boolean;
begin
if (ch >= 48) and (ch <= 57) then Result := True // 0-9
else if (ch >= 65) and (ch <= 90) then Result := True // A-Z
else if (ch >= 97) and (ch <= 122) then Result := True // a-z
else if (ch = 33) then Result := True // !
else if (ch >= 39) and (ch <= 42) then Result := True // '()*
else if (ch >= 45) and (ch <= 46) then Result := True // -.
else if (ch = 95) then Result := True // _
else if (ch = 126) then Result := True // ~
else Result := False;
end;
var
I, J: Integer;
ASrcUTF8: UTF8String;
begin
Result := ''; {Do not Localize}
ASrcUTF8 := UTF8Encode(ASrc);
// UTF8Encode call not strictly necessary but
// prevents implicit conversion warning
I := 1; J := 1;
SetLength(Result, Length(ASrcUTF8) * 3); // space to %xx encode every byte
while I <= Length(ASrcUTF8) do
begin
if IsSafeChar(Ord(ASrcUTF8[I])) then
begin
Result[J] := ASrcUTF8[I];
Inc(J);
end
else
begin
Result[J] := '%';
Result[J+1] := HexMap[(Ord(ASrcUTF8[I]) shr 4) + 1];
Result[J+2] := HexMap[(Ord(ASrcUTF8[I]) and 15) + 1];
Inc(J,3);
end;
Inc(I);
end;
SetLength(Result, J-1);
end;
To use this, do something like the following:
function GetAURL(const param, value: string): UTF8String;
begin
Result := 'http://www.example.com/search?'+
EncodeURIComponent(param)+
'='+
EncodeURIComponent(value);
end;
Hope this helps. Sorry, I haven’t got an IDN solution in this post!
Updated 15 Nov 2018: Fixed bug with handling of space (should output %20, not +).