首页 > 解决方案 > Python3 textcoding issue: extra first character when reading from text file using for loop

问题描述

I'm trying to read a number of ticker symbols from a text file, but seem to have a textcoding issue.

This is the contents of a test file 'tickers.txt':

SPG
WBA

This is my testcode:

    f = open("tickers.txt", 'r')
    for ticker in f:
        t = ticker.strip()
        if t:
          try:
            print(">"+t+"<" + ' length = '+ str(len(t)))
            i = 0
            while i < len(t):
              print(t[i])
              i += 1
            print('End')
          except ValueError:
            print('ValueError ticker')

And this is the resulting output:

>SPG< length = 4

S
P
G
End
>WBA< length = 3
W
B
A
End

For some reason there is an extra character in the first ticker symbol, which does not show when printed. Have read through several Q&A's here on StackOverflow I now assume it is a text coding issue, but don't understand yet how to solve this.... Do I need to add an 'encoding' statement to the file open command ? If so, which one ? How to detect ?

标签: python-3.xfor-loopencodingtext-filesstring-length

解决方案


更改print(t[i])print(i, t[i], '{:04x}'.format(ord(t[i]))),我可以得到以下输出,表明额外的第一个字符是Byte order mark

>SPG< length = 4
0  feff
1 S 0053
2 P 0050
3 G 0047
End
>WBA< length = 3
0 W 0057
1 B 0042
2 A 0041
End

使用utf_8_sig带有 BOM 签名的 UTF-8 编解码器。解码时,将跳过数据开头的可选 UTF-8 编码 BOM。

f = open("tickers.txt", mode='r', encoding='utf_8_sig')

代替



f = open("tickers.txt", 'r')

顺便说一句,别忘了f.close()……</p>


推荐阅读