python - 使用 BeautifulSoup 从 python 导出到 .csv
问题描述
我对此很陌生,似乎无法正确导出。
# select document
with open('scrape1.html') as html_file:
soup = BeautifulSoup(html_file, 'lxml')
# create/name csv
with open('speechengine_report.csv', 'w') as csv_file:
writer = csv.writer(csv_file)
writer.writerow(['computer', 'usagedata'])
# tell bs4 to only look at x tags with a class of y
for licensedata in soup.find_all('div', class_='licensedata'):
# scrape pc id
computer = licensedata.p.b.text
print(computer)
# scrape usage stats for each id
for usagedata in licensedata.find_all('td'):
# minutes = usagedata.table.tbody
print(usagedata.text)
# blank line
print()
# writer.writerow([computer, usagedata])
csv_file.close()
解决方案
您要将数据写入 csv 文件的其余代码应位于with块内。此外,您不需要 csv_file.close() ,因为它会为您处理。试试下面的代码。在python中读取文件处理
with open('scrape1.html') as html_file:
soup = BeautifulSoup(html_file, 'lxml')
# create/name csv
with open('speechengine_report.csv', 'w') as csv_file:
writer = csv.writer(csv_file)
writer.writerow(['computer', 'usagedata'])
# tell bs4 to only look at x tags with a class of y
for licensedata in soup.find_all('div', class_='licensedata'):
# scrape pc id
computer = licensedata.p.b.text
print(computer)
# scrape usage stats for each id
for usagedata in licensedata.find_all('td'):
# minutes = usagedata.table.tbody
print(usagedata.text)
# blank line
print()
# writer.writerow([computer, usagedata])
推荐阅读
- c - fork() 导致内存泄漏
- .htaccess - 问题 HTACCESS:如何赋予优先级绝对路径?
- c# - EF Core 2.2 中的动态 Where 子句
- swift - Swift:无法使用只读属性“xxx”覆盖可变属性
- dart - Dart 比较运算符
- ubuntu - 在 Ubuntu 上构建 Emacs - 找不到库
- javascript - 显示输入数据的最佳做法是什么 - textarea vs p tag
- bash - bash脚本猫和回声
- ios - 在自定义 NSURLProtocol 中捕获 POST 参数
- android - 使用 API 级别 28 显示不同的对话框布局