python - Python Scrape of Wikipedia table 然后导出到 csv
问题描述
我按照教程来抓取表格,然后将数据导出到 csv 文件。当我尝试执行文件时,我通过 PyCharm 收到错误消息
“ Traceback(最近一次调用最后一次):文件“I:/Scrape/MediumCode.py”,第 1 行,导入请求 ModuleNotFoundError:没有名为 'requests' 的模块“
我还假设代码及其逻辑中存在其他错误,但这是我遇到的第一个问题,并且如果不理解为什么无法识别库就无法进一步
成功运行 pip 安装请求
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://en.wikipedia.org/wiki/Public_holidays_in_Switzerland'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
containers = page_soup.findAll("table", {"class":"wikitable"})
filename = "holidays.csv"
f = open(filename, "w")
headers = "holiday, holiday_date"
f.write(headers)
for container in containers:
holiday = container.table.tbody.tr.td.a["title"]
name_container = container.findAll("a", {"class":"title"})
holiday_name = name_container[0].text
date_container = container.findAll("td")
date = date_container[0].text.strip()
print("holiday: " + brand)
print("holiday_name: " + holiday_name)
print("date: " + date)
f.write(holiday + "," + holiday_name.replace(",", "|") + "," + date + "\n")
f.close()
解决方案
使用pandas
库
.read_html(
) - 将 HTML 表格读入 DataFrame 对象列表。.to_csv()
- 将对象写入逗号分隔值 (csv) 文件。
import requests
import pandas as pd
url = 'https://en.wikipedia.org/wiki/Public_holidays_in_Switzerland'
response = requests.get(url)
tables = pd.read_html(response.text)
# write holiday table data into `holiday_data` csv file
tables[0].to_csv("holiday_data.csv")
安装熊猫库
pip3 install pandas
如果requests
库仍然没有在您的系统中引发错误,请尝试以下操作:
from urllib.request import urlopen as uReq
import pandas as pd
url = 'https://en.wikipedia.org/wiki/Public_holidays_in_Switzerland'
response = uReq(url)
tables = pd.read_html(response.read())
#select only holiday column
select_table_column = ["Holiday"]
'''
#or select multiple columns
select_table_column = ["Holiday","Date"]
'''
# filter table data by selected columns
holiday = tables[0][select_table_column]
# # write holiday table data into `holiday_data` csv file and set csv header
holiday.to_csv("holiday_data.csv",header=True)
推荐阅读
- angular - 如何在角度的 1 个 mat-option 中使用 2 个值
- entity-framework - 使用 Include 的 Eager Loading 不显示所有记录,只显示一个
- php - 当转发器字段包含超过 6 个值时,创建一个新的引导列
- android - 无法使用 Retrofit2 + RxJava2 在回收器视图中获取数据
- python - ImportError:math03.pyc 中有错误的幻数或错误:无法找到 vcvarsall.bat
- wordpress - ACF 对象不返回数据
- drupal - 跳过表单上的实体验证
- docker - rhel7 上的 docker 二进制文件未在 systemctl 中列出
- c++ - clang++:错误:链接器命令失败,退出代码为 1
- javascript - 编辑器 node.js 文件上传示例错误 - 上传文件时发生服务器错误