python-3.x - Python3 textcoding issue: extra first character when reading from text file using for loop
问题描述
I'm trying to read a number of ticker symbols from a text file, but seem to have a textcoding issue.
This is the contents of a test file 'tickers.txt':
SPG
WBA
This is my testcode:
f = open("tickers.txt", 'r')
for ticker in f:
t = ticker.strip()
if t:
try:
print(">"+t+"<" + ' length = '+ str(len(t)))
i = 0
while i < len(t):
print(t[i])
i += 1
print('End')
except ValueError:
print('ValueError ticker')
And this is the resulting output:
>SPG< length = 4
S
P
G
End
>WBA< length = 3
W
B
A
End
For some reason there is an extra character in the first ticker symbol, which does not show when printed. Have read through several Q&A's here on StackOverflow I now assume it is a text coding issue, but don't understand yet how to solve this.... Do I need to add an 'encoding' statement to the file open command ? If so, which one ? How to detect ?
解决方案
更改print(t[i])
为print(i, t[i], '{:04x}'.format(ord(t[i])))
,我可以得到以下输出,表明额外的第一个字符是Byte order mark。
>SPG< length = 4
0 feff
1 S 0053
2 P 0050
3 G 0047
End
>WBA< length = 3
0 W 0057
1 B 0042
2 A 0041
End
使用utf_8_sig
—带有 BOM 签名的 UTF-8 编解码器。解码时,将跳过数据开头的可选 UTF-8 编码 BOM。
f = open("tickers.txt", mode='r', encoding='utf_8_sig')
代替
f = open("tickers.txt", 'r')
顺便说一句,别忘了f.close()
……</p>
推荐阅读
- sql - 如何编写仅在满足条件时才运行的 SQL 语句?
- android - 自动续订订阅的价格变化
- css - 限制 textarea 内部文本宽度但不限制滚动条
- ios - 在 iOS 13.1 beta 上收到 Voip 推送通知时,活动电话会终止
- python - 如何在 pandas 数据框中跳过“Nan”值以及如何在每个图上显示带有 rsquared 的回归线
- excel - 如何在VBA中用变量行填充一列公式
- flex-lexer - “%option nodefault” 导致“flex 扫描仪卡住”
- python - 在 Python 中实现一个案例来执行特定的命令
- python - Python从列表列表中创建列表列表
- azure - Azure Func 或 Logic App 中的 OleDB 或 ODBC