首页 > 解决方案 > BeautifulSoup:“类型错误/属性错误:'NoneType'”

问题描述

import requests
from bs4 import BeautifulSoup

url = 'https://joboutlook.gov.au/A-Z'

r = requests.get(url)
c = r.content
soup = BeautifulSoup(c, 'html.parser')

urls = []
h4s = soup.find_all('h4')
for h4 in h4s:
    a = h4.find('a')
    print(a)
    href = a['href']
    print(href)
    new_url = f'https://joboutlook.gov.au/{href}'
    print(new_url)
    urls.append(new_url)
urls

打印所有工作。(a) 显示所有“a”标签,(href) 显示所有 href,(new_url) 显示所有新 url!

然而我不断得到TypeError: 'NoneType' object is not subscriptable,并且没有任何内容添加到 urls 列表中。

如果我将其更改为a.get('href')它说:AttributeError: 'NoneType' object has no attribute 'get'

(实际上不是谷歌,仅供参考)

这可能很简单,但我无法弄清楚。

谢谢!

标签: pythonweb-scrapingbeautifulsoup

解决方案


提供 if 条件,如果锚标签可用,则获取href并附加它。

import requests
from bs4 import BeautifulSoup
soup=BeautifulSoup(requests.get("https://joboutlook.gov.au/A-Z").text,'html.parser')
urls = []
h4s = soup.find_all('h4')
for h4 in h4s:
    a = h4.find('a')
    if a:
     href = a['href']
     #print(href)
     new_url ='https://joboutlook.gov.au/{}'.format(href)
     #print(new_url)
     urls.append(new_url)

print(urls)

推荐阅读