delphi - Is there a way to get just the ANSI characters from a string? Utf8decode fails when string contains emojis
问题描述
First I get a TMemoryStream from an HTTP request, which contains the body of the response. Then I load it in a TStringList and save the text in a widestring (also tried with ansistring).
The problem is that I need to convert the string because the users language is spanish, so vowels with accent marks are very common and I need to store the info.
lServerResponse := TStringList.Create;
lServerResponse.LoadFromStream(lResponseMemoryStream);
lStringResponse := lServerResponse.Text;
lDecodedResponse := Utf8Decode(lStringResponse );
If the response (a part of it) is "Hólá Múndó", lStringResponse value will be "Hólá Múndó", and lDecodedResponse will be "Hólá Múndó".
But if the user adds any emoji (lStringResponse value will be "Hólá Múndó 😀" if the emoji is ) Utf8Decode fails and returns an empty string. Is there a way to get just the ANSI characters from a string (or MemoryStream)?, or removing whatever Utf8Decode can't convert?
Thanks for your time.
解决方案
TMemoryStream
is just raw bytes. There is no reason to loading that stream into a TStringList
just to extract a (Wide|Ansi)String
from it. You can assign the bytes directly to an AnsiString
/UTF8String
using SetString()
instead, eg:
var
lStringResponse: UTF8String;
lDecodedResponse: WideString;
begin
SetString(lStringResponse, PAnsiChar(lResponseMemoryStream.Memory), lResponseMemoryStream.Size);
lDecodedResponse := UTF8Decode(lStringResponse);
end;
Just make sure the HTTP content really is encoded as UTF-8, or else this approach will not work.
That being said - UTF8Decode()
(and UTF8Encode()
) in Delphi 7 DO NOT support Unicode codepoints above U+FFFF, which means they DO NOT support Emojis at all. That was fixed in Delphi 2009.
To work around that issue in earlier versions, you can use the Win32 API MultiByteToWideChar()
function instead, eg:
uses
..., Windows;
function My_UTF8Decode(const S: UTF8String): WideString;
var
WLen: Integer;
begin
WLen := MultiByteToWideChar(CP_UTF8, 0, PAnsiChar(S), Length(S), nil, 0);
if WLen > 0 then
begin
SetLength(Result, WLen);
MultiByteToWideChar(CP_UTF8, 0, PAnsiChar(S), Length(S), PWideChar(Result), WLen);
end else
Result := '';
end;
var
lStringResponse: UTF8String;
lDecodedResponse: WideString;
begin
SetString(lStringResponse, PAnsiChar(lResponseMemoryStream.Memory), lResponseMemoryStream.Size);
lDecodedResponse := My_UTF8Decode(lStringResponse);
end;
Alternatively:
uses
..., Windows;
function My_UTF8Decode(const S: PAnsiChar; const SLen: Integer): WideString;
var
WLen: Integer;
begin
WLen := MultiByteToWideChar(CP_UTF8, 0, S, SLen, nil, 0);
if WLen > 0 then
begin
SetLength(Result, WLen);
MultiByteToWideChar(CP_UTF8, 0, S, SLen, PWideChar(Result), WLen);
end else
Result := '';
end;
var
lDecodedResponse: WideString;
begin
lDecodedResponse := My_UTF8Decode(PAnsiChar(lResponseMemoryStream.Memory), lResponseMemoryStream.Size);
end;
Or, use a 3rd party Unicode conversion library, like ICU or libiconv, which handle this for you.
推荐阅读
- .net-core - 在项目参考中包含构建时功能
- c# - 在c#中向现有文件添加新的文本行
- regex - 包含参数的 url 的正则表达式
- verilog - FSM 输出永远不会被设置
- javascript - 带有 contenteditable 的 Javascript MaxLength
- c# - 可以在没有主键的情况下创建多对多对象吗?
- google-apps-script - google.script.run.function() 在共享文件中
- mule - 如何在 Mule Dataweave 中循环和合并
- java - 无限循环的集成测试
- oracle - 如何引用表单数据以在 Apex Oracle 中的表单本身中使用它