首页 > 解决方案 > 试图从表中提取数据并且有外来字符阻止我写入 csv 文件

问题描述

我正在提取数据,但某些特殊字符会导致错误

from unicodedata import normalize


import codecs
import csv
import urllib2
import requests
from BeautifulSoup import BeautifulSoup

url = 'https://www.ratebeer.com/top'
response = requests.get(url)
html = response.content

soup = BeautifulSoup(html)
table = soup.find('tbody')

list_of_rows = []


for row in table.findAll('tr'):
list_of_cells = []
    for cell in row.findAll('td'):
        text = cell.text
        list_of_cells.append(text)
    list_of_rows.append(list_of_cells)

outfile = open("./top50.csv", "wb")
writer = csv.writer(outfile)
writer.writerows(list_of_rows)

尝试提取 csv 以导入 50 种顶级啤酒、排名、名称、风格、啤酒厂、评级的 excel

标签: pythonpython-2.7web-scrapingbeautifulsoupnon-ascii-characters

解决方案


这是有效的,python 3.6,定义的解析器features="lxml"和编码encoding='utf-8'

import codecs, csv, urlib, requests
from unicodedata import normalize
from bs4 import BeautifulSoup

url = 'https://www.ratebeer.com/top'
response = requests.get(url)
html = response.content

soup = BeautifulSoup(html, features="lxml")
table = soup.find('tbody')

list_of_rows = []

for row in table.findAll('tr'):
    list_of_cells = []
    for cell in row.findAll('td'):
        text = cell.text
        list_of_cells.append(text)
    list_of_rows.append(list_of_cells)

outfile = open("./top50.csv", "w", encoding='utf-8')
writer = csv.writer(outfile)
writer.writerows(list_of_rows)

推荐阅读