首页 > 解决方案 > BeautifulSoup 在网页上找不到表格

问题描述

我正在尝试从网站上的第一个表中获取数据。我在这里查看了类似的问题并尝试了许多给定的解决方案,但似乎无法找到表格并最终找到表格中的数据。

我试过了:

from bs4 import BeautifulSoup  
from selenium import webdriver  
driver = webdriver.Chrome('C:\\folder\\chromedriver.exe')  
url = 'https://docs.microsoft.com/en-us/windows/release-information/'  
driver.get(url)  

tbla = driver.find_element_by_name('table') #attempt using by element name  
tblb = driver.find_element_by_class_name('cells-centered') #attempt using by class name  
tblc = driver.find_element_by_xpath('//*[@id="winrelinfo_container"]/table[1]') #attempt by using xpath  

并尝试使用美丽的汤

html = driver.page_source
soup = BeautifulSoup(html,'html.parser')
table = soup.find("table", {"class": "cells-centered"})
print(len(table))

任何帮助深表感谢。

标签: pythonseleniumiframebeautifulsoupwebdriverwait

解决方案


存在于iframe您需要先切换iframe才能访问table.

诱导WebDriverWait()等待frame_to_be_available_and_switch_to_it() 和跟随定位器。

诱导WebDriverWait()等待visibility_of_element_located() 和跟随定位器。

driver.get("https://docs.microsoft.com/en-us/windows/release-information/")
WebDriverWait(driver,10).until(EC.frame_to_be_available_and_switch_to_it((By.ID,"winrelinfo_iframe")))
table=WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"table.cells-centered")))

您需要导入以下库。

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

或者您将以下代码与xpath.

driver.get("https://docs.microsoft.com/en-us/windows/release-information/")
WebDriverWait(driver,10).until(EC.frame_to_be_available_and_switch_to_it((By.ID,"winrelinfo_iframe")))
table=WebDriverWait(driver,10).until(EC.presence_of_element_located((By.XPATH,'//*[@id="winrelinfo_container"]/table[1]')))

您可以进一步将表数据导入熊猫数据框,然后导出到 csv 文件。您需要导入熊猫。

driver.get("https://docs.microsoft.com/en-us/windows/release-information/")
WebDriverWait(driver,10).until(EC.frame_to_be_available_and_switch_to_it((By.ID,"winrelinfo_iframe")))
table=WebDriverWait(driver,10).until(EC.presence_of_element_located((By.XPATH,'//*[@id="winrelinfo_container"]/table[1]'))).get_attribute('outerHTML')
df=pd.read_html(str(table))[0]
print(df)
df.to_csv("path/to/csv")

导入熊猫:pip install pandas

然后添加下面的库

import pandas as pd

推荐阅读