python - 从某个日期开始抓取数据
问题描述
我只想在某个日期之后从表中抓取数据。下面的代码获取数据中的第一个日期(附加 url),但是我将如何创建一个 for 循环来仅从 2020 年 10 月 11 日和之前的所有行中提取数据?
我想创建一个for循环来提取此表'table table-hover small horsePerformance'中某个日期之前的所有数据)
http://www.harness.org.au/racing/horse-search/?horseId=813476
with requests.Session() as s:
try:
webpage_response = s.get(horseurl, headers=headers)
except requests.exceptions.ConnectionError:
r.status_code = "Connection refused"
soup = bs(webpage_response.content, "html.parser")
horseresult6 = soup.find('table', class_='table table-hover small horsePerformance')
daysbetween = horseresult6.find('td', class_='date').get_text().strip()
daysbetween24 = horseresult6.find('td', class_='date').find_next('td', class_='date').get_text().strip()
但是我认为它应该看起来像
for tr in horseresult6.find_all('tr')[1:]:
daysbetween = tr.find('td', class_='date').get_text().strip()
if xdate > daysbetween:
do something
else:
continue
当我尝试这个时,它似乎不起作用
解决方案
<
您可以使用and运算符比较日期>
。
就是这样:
import time
import requests
from bs4 import BeautifulSoup
horse_url = "http://www.harness.org.au/racing/horse-search/?horseId=813476"
with requests.Session() as s:
try:
webpage_response = s.get(horse_url)
except requests.exceptions.ConnectionError:
webpage_response.status_code = "Connection refused"
table = BeautifulSoup(
webpage_response.content,
"html.parser",
).find('table', class_='table table-hover small horsePerformance')
target_date = "11 Oct 2020"
for row in table.find_all("tr")[1:]: # skipping the header
date = row.find("td", class_="date").find("a").getText() # table date
if time.strptime(date, "%d %b %Y") >= time.strptime(target_date, "%d %b %Y"): # comparing the dates
# do your parsing here, this is just an example
print(f'{date} - {row.find("td", class_="stake").getText(strip=True)}')
输出:
05 Apr 2021 - $4,484
29 Mar 2021 - $595
23 Mar 2021 - $4,484
12 Mar 2021 - $220
08 Mar 2021 - $181
02 Mar 2021 - $263
19 Feb 2021 - $180
12 Feb 2021 - $1,200
26 Jan 2021 - $4,484
时光倒流:
target_date = "26 Jan 2021"
for row in table.find_all("tr")[1:]: # skipping the header
date = row.find("td", class_="date").find("a").getText() # table date
if time.strptime(date, "%d %b %Y") <= time.strptime(target_date, "%d %b %Y"): # comparing the dates
# do your parsing here, this is just an example
print(f'{date} - {row.find("td", class_="stake").getText(strip=True)}')
输出:
26 Jan 2021 - $4,484
14 Sep 2020 - $100
11 Sep 2020 - $616
04 Sep 2020 - $180
21 Aug 2020 - $180
17 Aug 2020 - $595
28 Jul 2020 - $4,291
21 Jul 2020 - $3,523
13 Jul 2020 - $300
30 Jun 2020 - $1,173
15 Jun 2020 - $100
30 May 2020 - $3,523
22 May 2020 - $500
12 May 2020 - $963
05 May 2020 - $3,523
02 May 2020 - $1,986
24 Apr 2020 - $144
09 Apr 2020 - $144
30 Mar 2020 - $1,225
10 Mar 2020 - $100
09 Dec 2019 - $595
02 Dec 2019 - $4,484
26 Nov 2019 - $4,484
19 Nov 2019 - $100
02 Nov 2019 - $4,484
27 Oct 2019 - $2,562
13 Oct 2019 - $700
31 May 2019 - $1,000
21 May 2019 - $4,484
07 May 2019 - $1,225
27 Apr 2019 - $595
21 Apr 2019 - $0
14 Apr 2019 - $0
07 Apr 2019 - $0
推荐阅读
- php - 从图库中提取所选类别的结果 - php sql
- javascript - 检测异步函数的“仅返回承诺”状态
- button - 如何确定使用 xPath 选择了哪个选项按钮
- javascript - 为什么我无法访问设置了 src 或 srcdoc 属性的 iframe 文档?
- c# - 我怎么写这个,所以程序在用户输入他的名字后重新启动
- arrays - 在 Hive SQL 中分解列表以识别空白
- html - HTML CSS & Javascript 响应式导航栏
- solrj - 使用 Spring Data Solr 标记字段
- r - 我只能在 R 中安装软件包,只能从桌面打开 R
- java - 使用 Apache Commons CSV 从带有 Header 的 HashMap 列表中写入 CSV