首页 > 解决方案 > Python抓取bs4 TypeError:'NoneType'对象不可下标

问题描述

我希望你很好。您能否告诉我为什么我可以正确使用我的抓取脚本:) 它适用于其他网站。我是初学者,所以我可能犯了一个基本错误

import requests
from bs4 import BeautifulSoup
import time
import csv

links = []
for i in range(1):
    url = '*******/recettes/?page={}' + str(i)
    res  = requests.get(url,headers={'User-Agent': 'Mozilla/5.0'})
    response = requests.get(url)
    print(response)
    if response.ok:
        print('Page: ' + str(i))
        soup = BeautifulSoup(response.text, "html.parser")
        divs = soup.findAll('div', class_ = 'field-item even')
        for div in divs:
            a = div.find('a')
            link = a['href']
            links.append('*******' + link)
        time.sleep(3)
print(len(links))

with open('urls3.txt', 'w') as file:
    for link in links:
        file.write(link + '\n')

"""

with open('urls3.txt', 'r') as inf:
  with open('recipes3.csv', 'w') as outf:
        outf.write('titre,image,url,\n')
        for row in inf:
            url = row.strip()
            response = requests.get(url)
            if response.ok:
                soup = BeautifulSoup(response.text, "html.parser")
                titre = soup.find('h1')
                image = soup.find('img', {"id":"recipe-media-viewer-thumbnail-1"})['src']
                print(titre.text, image, url)
                outf.write(str(titre.text) + ',' + str(image) + ',' + str(url) +  '\n')
            time.sleep(1)    
"""

你能告诉我为什么这里有错误:

<Response [200]>
Page: 0
Traceback (most recent call last):
  File "ex3.py", line 18, in <module>
    link = a['href']
TypeError: 'NoneType' object is not subscriptable

标签: pythonweb-scraping

解决方案


我找到了答案,我把它贴在这里:) 给任何有兴趣的人

try:    
    image = soup.find('img', {"id":"recipe-media-viewer-thumbnail-1"})['src']
except Exception as e:
    image = None

推荐阅读