python - python - 如何将从txt文件读取的代理对转换回python 3中的表情符号？

问题描述

我有几个 txt 文件要读取其中有字符串的位置，例如：

“是的！罐头里的沙丁鱼保持距离！\uD83E\uDD23”

问题是：当我在做

"Yes! Sardines in a can distancing! \uD83E\uDD23".encode('utf-16','surrogatepass' ).decode('utf-16)

unicode 点被转换为表情符号，因为 python 将 \UDD23 或 \UD83E 分别视为两个单个字符。

输出：

Yes! Sardines in a can distancing!

此外，当我使用 len() 函数查看上述字符串的长度时，输出为 37。

但是，当我从文本文件中读取相同的字符串时，python 会将 \UDD23 或 \UD83E 读取为单独的字符，即总共 12 个字符，这是我不想要的，因为我的 encode().decode() 函数不会给出预期的结果。也就是说，unicode 点不会转换为表情符号。我使用了下面的代码：

count=0
for item in enumerate(list(tweet_dict)):
    if item[0]==75:
        a=item[1]['text']
        print('Length of the string is: ',len(str(a)))
        print(a.encode('utf-16', 'surrogatepass').decode('utf-16'))

输出是：

Length of the string is:  47
Yes! Sardines in a can distancing! \uD83E\uDD23

标签： pythonpython-3.xemojipython-unicodesurrogate-pairs

python - python - 如何将从txt文件读取的代理对转换回python 3中的表情符号？

问题描述

解决方案

推荐阅读