python - TypeError:需要一个类似字节的对象,而不是 'str' py 2.7 到 py 3.6
问题描述
我是 Python 学习者,也是 stackoverflow 的新手。以下代码是用 Python 2.7 编写的,当我尝试使用 Python 3.6 运行它时,出现以下错误。我阅读了许多关于该错误的先前帖子,但我仍然无法解决我的代码。请告诉我哪条线路需要修复以及如何修复。
TypeError Traceback (most recent call last)
<ipython-input-52-db1423a8bf7b> in <module>
71
72 if __name__ == "__main__":
---> 73 main()
<ipython-input-52-db1423a8bf7b> in main()
54 csvWriter = csv.writer(csvOutput, quoting = csv.QUOTE_NONNUMERIC)
55
---> 56 csvWriter.writerow(["Ticker", "DocIndex","IndexLink", "Description", "FilingDate","NewFilingDate"])
57 csvOutput.close()
58
TypeError: a bytes-like object is required, not 'str'
import os,sys,csv,time # "time" helps to break for the url visiting
from bs4 import BeautifulSoup # Need to install this package manually using pip
# We only import part of the Beautifulsoup4
import urllib.request
from urllib.request import urlopen
os.chdir('E:\Python\python_exercise') # The location of your file "LongCompanyList.csv"
companyListFile = "CompanyList.csv" # a csv file with the list of company ticker symbols and names (the file has a line with headers)
IndexLinksFile = "IndexLinks.csv" # a csv file (output of the current script) with the list of index links for each firm (the file has a line with headers)
def getIndexLink(tickerCode,FormType):
csvOutput = open(IndexLinksFile,"a+b") # "a+b" indicates that we are adding lines rather than replacing lines
csvWriter = csv.writer(csvOutput, quoting = csv.QUOTE_NONNUMERIC)
urlLink = "https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK="+tickerCode+"&type="+FormType+"&dateb=&owner=exclude&count=100"
pageRequest = urllib.Request(urlLink)
pageOpen = urllib.urlopen(pageRequest)
pageRead = pageOpen.read()
soup = BeautifulSoup(pageRead,"html.parser")
#Check if there is a table to extract / code exists in edgar database
try:
table = soup.find("table", { "class" : "tableFile2" })
except:
print ("No tables found or no matching ticker symbol for ticker symbol for"+tickerCode)
return -1
docIndex = 1
for row in table.findAll("tr"):
cells = row.findAll("td")
if len(cells)==5:
if cells[0].text.strip() == FormType:
link = cells[1].find("a",{"id": "documentsbutton"})
docLink = "https://www.sec.gov"+link['href']
description = cells[2].text.encode('utf8').strip() #strip take care of the space in the beginning and the end
filingDate = cells[3].text.encode('utf8').strip()
newfilingDate = filingDate.replace("-","_") ### <=== Change date format from 2012-1-1 to 2012_1_1 so it can be used as part of 10-K file names
csvWriter.writerow([tickerCode, docIndex, docLink, description, filingDate,newfilingDate])
docIndex = docIndex + 1
csvOutput.close()
def main():
FormType = "10-K" ### <=== Type your document type here
nbDocPause = 10 ### <=== Type your number of documents to download in one batch
nbSecPause = 0 ### <=== Type your pausing time in seconds between each batch
csvFile = open(companyListFile,"r") #<===open and read from a csv file with the list of company ticker symbols (the file has a line with headers)
csvReader = csv.reader(csvFile,delimiter=",")
csvData = list(csvReader)
csvOutput = open(IndexLinksFile,"a+b") #<===open and write to a csv file which will include the list of index links. New rows will be appended.
csvWriter = csv.writer(csvOutput, quoting = csv.QUOTE_NONNUMERIC)
csvWriter.writerow(["Ticker", "DocIndex","IndexLink", "Description", "FilingDate","NewFilingDate"])
csvOutput.close()
i = 1
for rowData in csvData[1:]:
ticker = rowData[0]
getIndexLink(ticker,FormType)
if i%nbDocPause == 0:
print (i)
print ("Pause for "+str(nbSecPause)+" second .... ")
time.sleep(float(nbSecPause))
i=i+1
csvFile.close()
print ("done!")
if __name__ == "__main__":
main()
解决方案
在 Python 3 中,您将希望尽可能使用 Unicode 字符串,而不是二进制 ( b
) 数据。
- 将
"a+b"
文件打开模式更改"a+"
为获取可以写入字符串的文件;它们将被转换为 UTF-8(您可以使用encoding
参数将其更改为open
)。 - 删除
.encode()
您的电话;BeautifulSoup 是 Unicode 字符串原生的,一旦您的文件以文本模式打开,如上所述,该编码将为您完成。
推荐阅读
- c++ - spdlog 意外包含参考点
- jquery - 为什么我无法使用 $("a") 查看网页中的所有链接名称?
- docker - 使用 start-first 进行滚动更新时,docker swarm 会采取哪些步骤?
- firebase - 如何在我的颤振应用程序中实现正确的注销方法?
- android - 源代码中按钮的按住事件
- android - 使用 Query 查询数据库中的两个数据
- extjs - ExtJS - 网格单元工具提示
- android - writeDisplayPhoto() 更改另一个联系人的照片
- visual-studio-code - Visual Studio Code javascript 建议无法正常工作,烦人的选项太多
- javascript - 如何从 event.code 中找出 event.key