python - 如何使用 BeautifulSoup 从网站获取特定的子类?
问题描述
我想从这个链接获取表格:Soccer Players Market Values 我设法这样做了,代码如下:
def few(urls, file):
f = open(file, 'a', newline='', encoding="utf-8")
writer = csv.writer(f)
url = urls
page = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})
soup = BeautifulSoup(page.content, 'lxml')
tbody = soup('table', {"class": "items"})[0].find_all('tr')
for row in tbody:
cols = row.findChildren(recursive=False)[1:]
exclude = [0, 3, 4, 6, 7, 8]
cols = [ele.text.strip() for ele in cols[:] if ele not in exclude]
writer.writerow(cols)
我的问题是,我想从第一列中获取名称(例如:“Ram Strauss”),而不是它包含的所有数据。
你能帮忙吗?非常感谢!
解决方案
import requests , csv
def SaveAsCsv(list_of_rows,file_name):
try:
print('\nSaving CSV Result')
with open(file_name, 'a', newline='', encoding='utf-8') as outfile:
writer = csv.writer(outfile)
writer.writerow(list_of_rows)
print("rsults saved successully")
except PermissionError:
print(f"Please make sure {file_name} is closed \n")
def fetch_data(url,file_name='test.csv'):
page = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})
if page.status_code == 200 :
soup = BeautifulSoup(page.content, 'lxml')
header = [col_name.text.strip() for col_name in soup.select('table.items thead th')[1:]]
SaveAsCsv(header,file_name)
rows = soup.select('table.items tbody tr')
for row in rows:
name_tag = row.select('img.bilderrahmen-fixed')
if name_tag:
name = name_tag[0].get('title')
tds = row.select('td')[5:]
cols = [ele.text.strip() for ele in tds]
if cols :
cols.insert(0,name)
SaveAsCsv(cols,file_name)
fetch_data('https://www.transfermarkt.com/hapoel-acre/kader/verein/6025/saison_id/2017/plus/1')
推荐阅读
- php - 显示注意事项:多维数组中未定义的偏移量
- reactjs - 从我的 React 应用程序中的环境变量加载值不起作用
- mysql - 如何在同一张表的新列中连接主键(id)和日期?(MySQL)
- batch-file - 为什么我的文件在进入 If Statments 时会突然关闭?
- python - 如何在脚本python3中递归地重命名子目录和文件名?
- javascript - 在 Internet Explorer 11 中获取未定义?
- wordpress - 该网站使用什么平台?
- python-3.x - 在不使用 Sklearn Pipeline 的情况下获得与 Sklearn Pipeline 相同的结果
- c# - C# 从读取 modbus 设备将字节解析为 2 的补码
- python - 我的文本不会在 pygame 中显示