python - 试图从表中提取数据并且有外来字符阻止我写入 csv 文件
问题描述
我正在提取数据,但某些特殊字符会导致错误
from unicodedata import normalize
import codecs
import csv
import urllib2
import requests
from BeautifulSoup import BeautifulSoup
url = 'https://www.ratebeer.com/top'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html)
table = soup.find('tbody')
list_of_rows = []
for row in table.findAll('tr'):
list_of_cells = []
for cell in row.findAll('td'):
text = cell.text
list_of_cells.append(text)
list_of_rows.append(list_of_cells)
outfile = open("./top50.csv", "wb")
writer = csv.writer(outfile)
writer.writerows(list_of_rows)
尝试提取 csv 以导入 50 种顶级啤酒、排名、名称、风格、啤酒厂、评级的 excel
解决方案
这是有效的,python 3.6,定义的解析器features="lxml"
和编码encoding='utf-8'
:
import codecs, csv, urlib, requests
from unicodedata import normalize
from bs4 import BeautifulSoup
url = 'https://www.ratebeer.com/top'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html, features="lxml")
table = soup.find('tbody')
list_of_rows = []
for row in table.findAll('tr'):
list_of_cells = []
for cell in row.findAll('td'):
text = cell.text
list_of_cells.append(text)
list_of_rows.append(list_of_cells)
outfile = open("./top50.csv", "w", encoding='utf-8')
writer = csv.writer(outfile)
writer.writerows(list_of_rows)
推荐阅读
- python - 使用 PCA9685 和 Raspberry Pi 控制伺服
- javascript - 我如何获得可以以 (3^a)*(5^b)*(7^c) 形式表示的数字
- asp.net-core - 在 SendEmail MSGRAPH API InternalServerError 中,对象引用未设置为对象的实例
- unit-testing - 如何在同一结构中的另一个方法中调用存根方法
- python - 如何解决类objecto没有属性
- amazon-web-services - rdsadmin 用户从 Aurora Server Less 数据库集群登录 CloudWatch
- javascript - 如何处理 JQuery 中的重复值
- python - 使用 Python 将非标准文件上传到 Google Drive 的问题
- django-rest-framework - django rest 框架:SlugRelatedField 选项 - 由用户限制
- css - 如何在屏幕尺寸缩小时使元素动画回到原始位置?