首页 > 解决方案 > python抓取从表中提取href

问题描述

我正在尝试从表中提取 href 链接。运行代码后,我只收到表中第一行的链接。为什么只检查第一行?我做错了什么?下面的代码:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.select import Select
import requests
import select
import pandas as pd
from bs4 import BeautifulSoup
browser = webdriver.Firefox()
browser.maximize_window()
browser.get('http://licytacje.komornik.pl/Notice/Search')
browser.find_element_by_xpath("//select[@name='Type']/option[text()='Nieruchomość']").click()
browser.find_element_by_class_name('button_next_active').click()
soup = BeautifulSoup(browser.page_source, "lxml")
for table in soup.findAll('table', {'class': 'wMax'}):
    for tr in table.findAll('tr'):
        for a in tr.findAll('a'):
            print(a['href'])

输出是:

/Notice/Search?sortOrder=DataLicytacji
/Notice/Search?sortOrder=Kategoria
/Notice/Search?sortOrder=Nazwa
/Notice/Search?sortOrder=Miasto
/Notice/Search?sortOrder=Wojew%C3%B3dztwo
/Notice/Search?sortOrder=Cena

那是仅来自表格标题行的链接。其他人呢?

标签: pythonseleniumbeautifulsoupextract

解决方案


推荐阅读