python - python windows中的utf8编码问题
问题描述
我正在用 Python 处理 Windows 操作系统上的文件。我收到了诸如 Unicode error surrogate not allowed 之类的错误。
文档中的示例文本:
Ten states led by Texas Attorney General Ken Paxton (R) filed an antitrust lawsuit against
Google on Wednesday, alleging the tech giant illegally sought to suppress competition and
reap massive profits from targeted advertisements placed across the Web.
The lawsuit — filed in a Texas federal court and backed exclusively by Republicans — strikes
at the heart of Google’s lucrative business in connecting those who seek to buy online ads
with the websites that sell them. Paxton and his GOP allies contend that Google relied on a
mix
of improper tactics to force its ad tools on publishers and solidify its pole position as a
“middleman” in the invisible transactions that power much of the Web.
Online advertising is expected to generate $42 billion in revenue this year for Google,
which captures a third of all digital ad spending, according to an October projection from
eMarketer 公司。谷歌的巨大影响力使德克萨斯州和其他州的总检察长在他们的诉讼中得出结论,这家科技巨头基本上已经建立了“现有最大的电子交易市场”,其运营的广告系统与证券交易所的交易没有什么不同。
代码1:
return_doc.to_csv(path, index= False)
Error1: UnicodeEncodeError: 'utf-8' codec can't encode character '\udc9d' in position 168: surrogates not allowed
代码2:
return_doc.to_csv(path, index= False, encoding='cp1252')
错误2:UnicodeEncodeError:'charmap'编解码器无法在位置168编码字符'\udc9d':字符映射到
代码3:
return_doc.to_csv(path, index= False, encoding='ISO 8859-15')
错误 3:UnicodeEncodeError:“charmap”编解码器无法在位置 14 编码字符“\u201d”:字符映射到
我用过Code4:
return_doc.to_csv(path, index= False, encoding='cp1252', errors='replace)
文字来自
“The actions harm every person in America,” Paxton said in a video statement preceding the
case, which asked a judge to consider “structural” remedies that could theoretically include
forcing a breakup of the company.
转换成
“The actions harm every person in America,�? Paxton said in a video statement preceding
the case, which asked a judge to consider “structural�? remedies that could
theoretically include forcing a breakup of the company.
这是我不想发生的。
请向我建议一个解决方案,我不会收到任何错误并且不会更改文本。
解决方案
当 stdio 为控制台时,Python 默认使用 UTF-8。但是如果 stdio 被重定向(例如文件或管道),Python 使用 ANSI 代码页编码。
您可以使用 UTF-8 模式默认使用 UTF-8 进行文本编码。请参阅https://docs.python.org/3/using/windows.html#utf-8-mode以供参考。
推荐阅读
- javascript - react JS中的回调函数
- reactjs - 发出 POST 请求(使用 React)后执行新的 GET 请求是最佳做法吗?
- python-3.x - 通用视图中的 request.user 和 django 中的纯函数
- amazon-web-services - 是否可以确定在 cloudwatch aws 上发送警报的时间?
- python - 如何使用递归以最大深度搜索现有文件?
- ssl - 我托管 GitLab 的私人服务器是否需要有效的 CA 证书才能在单独的服务器上注册运行程序?
- amazon-web-services - CDK: How to get apigateway key value (即 x-api-key: *20 Chars*)
- msbuild - 如何使用 MSBuild 命令仅还原特定的解决方案文件夹
- json - Gatsby JSON 加载器不遵循参考
- c# - 当 dispose 选项为真时,为什么我必须手动处置 Serilog Logger(Sumologic sink)?