首页 > 解决方案 > 尝试提取数据并希望保存在 excel 中,但使用 python beautifulsoup 时出错

问题描述

试图提取但最后一个字段出现错误想要将所有字段保存在 excel 中。

我曾尝试使用 beautifulsoup 提取但未能捕捉到,低于错误

回溯(最近一次通话最后):

文件“C:/Users/acer/AppData/Local/Programs/Python/Python37/agri.py”,第 30 行,在

标本 = soup2.find('h3',class_='触发器

展开').find_next_sibling('div',class_='collapsefaq-content').text

AttributeError:“NoneType”对象没有属性“find_next_sibling”

from bs4 import BeautifulSoup
import requests

page1 = requests.get('http://www.agriculture.gov.au/pests-diseases-weeds/plant#identify-pests-diseases')

soup1 = BeautifulSoup(page1.text,'lxml')

for lis in soup1.find_all('li',class_='flex-item'):
    diseases = lis.find('img').next_sibling
    print("Diseases: " + diseases)
    image_link = lis.find('img')['src']
    print("Image_Link:http://www.agriculture.gov.au" + image_link)
    links = lis.find('a')['href']
    if links.startswith("http://"):
        link = links
    else:
        link = "http://www.agriculture.gov.au" + links
    page2 = requests.get(link)
    soup2 = BeautifulSoup(page2.text,'lxml')

    try:
        origin = soup2.find('strong',string='Origin: ').next_sibling
        print("Origin: " + origin)
    except:
        pass
    try:
        imported = soup2.find('strong',string='Pathways: ').next_sibling
        print("Imported: " + imported)
    except:
        pass 
    specimens = soup2.find('h3',class_='trigger expanded').find_next_sibling('div',class_='collapsefaq-content').text
    print("Specimens: " + specimens)

想提取最后一个字段并使用python将所有字段保存到excel表中,请帮助我。

标签: htmlpython-3.xweb-scrapingbeautifulsoup

解决方案


小错字:

   data2,append("Image_Link:http://www.agriculture.gov.au" + image_link)

应该:

   data2.append("Image_Link:http://www.agriculture.gov.au" + image_link) #period instead of a comma

推荐阅读