首页 > 解决方案 > 由于 HTML 更改导致的“NoneType”错误。问:您如何帮助改变数据格式?

问题描述

我在下面的 HTML 上收到一个错误,这给了我一个错误(在 之间缺少数据)。我只想在强标签后抓取数据,所以很好,1:56:5 和 1:56.5

<td><strong>Track Rating:</strong> GOOD</td>
<td></td>
<td><strong>Gross Time:</strong> 1:56:5</td>
<td><strong>Mile Rate:</strong> 1:56:5</td>

可以正常工作的 HTML(没有丢失数据)

<td><strong>Track Rating:</strong> GOOD</td>
<td><strong>Gross Time:</strong> 2:29:6</td>
<td><strong>Mile Rate:</strong> 1:58:6</td>
<td><strong>Lead Time:</strong> 30.3</td>

抓取这些数据的代码是

from datetime import datetime, date, timedelta
import requests
import re
import csv
import os
import numpy
import pandas as pd
from bs4 import BeautifulSoup as bs

base_url = "http://www.harness.org.au/racing/results/?firstDate="
base1_url = "http://www.harness.org.au"

webpage_response = requests.get('http://www.harness.org.au/racing/results/?firstDate=')

soup = bs(webpage_response.content, "html.parser")

format = "%d-%m-%y"
delta = timedelta(days=1)
yesterday = datetime.today() - timedelta(days=1)


enddate = datetime(2019, 1, 1)



while enddate <= yesterday:
    enddate += timedelta(days=1)
    enddate1 = enddate.strftime("%d-%m-%y") 
    new_url = base_url + str(enddate1)
    soup12 = requests.get(new_url)
    soup1 = bs(soup12.content, "html.parser") 
    table1 = soup1.find('table', class_='meetingListFull')
    
    tr = table1.find_all('tr', {'class':['odd', 'even']})
    
    for tr1 in tr or trr:
        tr2 = tr1.find('a').get_text()
        tr3 = tr1.find('a')['href']
        newurl = base1_url + tr3
        with requests.Session() as s:
            webpage_response = s.get(newurl)
            soup = bs(webpage_response.content, "html.parser")
            #soup1 = soup.select('.content')
            results = soup.find_all('div', {'class':'forPrint'})
....
for race in results:
tableoftimes = race.find('table', class_='raceTimes')
trackrating = tableoftimes.find(text="Track Rating:").findPrevious('td').contents[1]
grosstime = tableoftimes.find(text="Track Rating:").find_next('td').contents[1]
milerate = tableoftimes.find(text="Gross Time:").findNext('td').contents[1]
leadtime = tableoftimes.find(text="Mile Rate:").findNext('td').contents[1]
firstquarter = tableoftimes.find(text="Lead Time:").findNext('td').contents[1]
....

我只是将点附加到列表以利用数据

我的目标 - 稍后附加到列表以获取所有数据。最好的情况是,即使数据完整,我也想抓取数据,但我完全卡在了我什至会创建一个规则以在不完整的情况下忽略所有数据的地方。尝试了一些类似下一个邻居的方法,但由于网站中的数据发生变化,我经常收到错误消息说“NoneType”对象没有“findNext”属性。

更新 我已将代码更新为

tableoftimes = race.find('table', class_='raceTimes')
                for row in tableoftimes.find_all('tr'):
                    string23 = [td.get_text() for td in row.find_all('td')]

打印 ['Track Rating: GOOD ', 'Gross Time: 2:05:1 ', 'Mile Rate: 1:56:4 ', 'Lead Time: 8.1 '] ['First Quarter: 29.4 ', 'Second Quarter :32 ','第三季度:28.4 ','第四季度:27.2 '] ['边距:HFHD x HFNK']

我想要斜体的数据,但前提是它符合标题。我尝试的大多数 if 语句都会给我错误 - 'list' 对象没有属性 'string' 或类似的东西,因为我试图访问嵌套列表中的文本。从这里有什么想法吗?

标签: pythonhtmlpython-3.xbeautifulsoup

解决方案


您可以使用几个嵌套的 if 添加一些无安全性,但如果您必须为每个可能返回 None 的 find 添加 if,它会变得非常混乱。试试这种方法:https ://stackoverflow.com/a/11791040/9392216

for row in table.find_all("tr")[1:]:
    datarow = [td.get_text() for td in row.find_all("td")]

推荐阅读