python - 一遍又一遍地重复,而不是在汤里重复一次
问题描述
我不断重复这段代码。我很想从这个网站上刮掉所有过去的结果数据,但我一直在一个一个地循环?
例如,race_number 打印为 1、1、2、1、2、3 等
最终目标是用数据填充所有列表并将其熊猫出来以查看结果和趋势。
import requests
import csv
import os
import numpy
import pandas
from bs4 import BeautifulSoup as bs
with requests.Session() as s:
webpage_response = s.get('http://www.harness.org.au/racing/fields/race-fields/?mc=SW010420')
soup = bs(webpage_response.content, "html.parser")
#soup1 = soup.select('.content')
results = soup.find_all('div', {'class':'forPrint'})
race_number = []
race_name = []
race_title = []
race_distance = []
place = []
horse_name = []
Prizemoney = []
Row = []
horse_number = []
Trainer = []
Driver = []
Margin = []
Starting_odds = []
Stewards_comments = []
Scratching = []
Track_Rating = []
Gross_Time = []
Mile_Rate = []
Lead_Time = []
First_Quarter = []
Second_Quarter = []
Third_Quarter = []
Fourth_Quarter = []
for race in results:
race_number1 = race.find(class_='raceNumber').get_text()
race_number.append(race_number1)
race_name1 = race.find(class_='raceTitle').get_text()
race_name.append(race_name1)
race_title1 = race.find(class_='raceInformation').get_text(strip=True)
race_title.append(race_title1)
race_distance1 = race.find(class_='distance').get_text()
race_distance.append(race_distance1)
需要帮助一遍又一遍地修复迭代,以及查看表数据而不是上面的标题的下一个最佳举措是什么?干杯
解决方案
这是您期望的输出:
import requests
import csv
import os
import numpy
import pandas as pd
import html
from bs4 import BeautifulSoup as bs
with requests.Session() as s:
webpage_response = s.get('http://www.harness.org.au/racing/fields/race-fields/?mc=SW010420')
soup = bs(webpage_response.content, "html.parser")
#soup1 = soup.select('.content')
data = {}
data["raceNumber"] = [ i['rowspan'] for i in soup.find_all("td", {"class": "raceNumber", "rowspan": True})]
data["raceTitle"] = [ i.get_text(strip=True) for i in soup.find_all("td", {"class": "raceTitle"})]
data["raceInformation"] = [ i.get_text(strip=True) for i in soup.find_all("td", {"class": "raceInformation"})]
data["distance"] = [ i.get_text(strip=True) for i in soup.find_all("td", {"class": "distance"})]
print(data)
data_frame = pd.DataFrame(data)
print(data_frame)
## Output
## raceNumber raceTitle raceInformation distance
##0 3 PREMIX KING PACE $4,500\n\t\t\t\t\t4YO and older.\n\t\t\t\t\tNR... 1785M
##1 3 GATEWAY SECURITY PACE $7,000\n\t\t\t\t\t4YO and older.\n\t\t\t\t\tNR... 2180M
##2 3 PERRY'S FOOTWEAR TROT $7,000\n\t\t\t\t\t\n\t\t\t\t\tNR 46 to 55.\n\t... 2180M
##3 3 DELAHUNTY PLUMBING 3YO TROT $7,000\n\t\t\t\t\t3YO.\n\t\t\t\t\tNR 46 to 52.... 2180M
##4 3 RAYNER'S FRUIT & VEGETABLES 3YO PACE $7,000\n\t\t\t\t\t3YO.\n\t\t\t\t\tNR 48 to 56.... 2180M
##5 3 KAYE MATTHEWS TRIBUTE $9,000\n\t\t\t\t\t4YO and older.\n\t\t\t\t\tNR... 2180M
##6 3 TALQUIST TREES PACE $7,000\n\t\t\t\t\t\n\t\t\t\t\tNR 62 to 73.\n\t... 2180M
##7 3 WEEKLY ADVERTISER 3WM PACE $7,000\n\t\t\t\t\t\n\t\t\t\t\tNR 56 to 61.\n\t... 1785M
推荐阅读
- mysql - nodejs 服务器 app.post 问题 XML 解析错误:语法错误位置:http://localhost:3000/get_messages 第 1 行第 1 列:
- c# - WPF中切换用户控件只显示ViewModel的名称
- excel - 组合框列出路径中的最新 2 个文件夹
- r - 是否可以以水平和垂直显示总和的方式来 table() 两列?
- php - 使用 FrontController 将元素添加到挂钩
- c++ - 这两个连接字符串的 C++ 语句有什么区别?
- algorithm - 基于与某个单元格的接近度循环遍历 NxN 个网格单元格
- marklogic - 恢复是否强制为目标数据库中的新自定义字典索引重新索引数据库?
- html - 更改相对于上一个类的字体大小
- ssl - libucrl 未能检查服务器证书