首页 > 解决方案 > 使用python进行网页抓取以打印类div

问题描述

我想将它们全部打印为顶部 div 中给定站点的一类 div。这将是我有兴趣打印的网站 html 的一部分

<div class="game">
  <div class="history-feed__collection">
     <div class="history-feed__card h-card h-card_sm h-card_spades" style="width: 41px; margin-right: 18px; opacity: 1;">
         <div class="h-card__sign">9</div></div>
     <div class="history-feed__card h-card h-card_sm h-card_hearts" style="width: 41px; margin-right: 18px; opacity: 1;">
         <div class="h-card__sign">K</div></div>
     <div class="history-feed__card h-card h-card_sm h-card_diamonds" style="width: 41px; margin-right: 18px; opacity: 1;">
         <div class="h-card__sign">Q</div></div>
     <div class="history-feed__card h-card h-card_sm h-card_clubs" style="width: 41px; margin-right: 18px; opacity: 1;">
         <div class="h-card__sign">2</div>
</div></div>

Eu gostaria que o programa imprimisse assim: "history-feed__card h-card h-card_sm h-card_spades, history-feed__card h-card h-card_sm h-card_hearts, ..."

我启动了这段代码,但我仍然发现问题,因为代码只打印 Div 中包含的内容,而不是其类的名称

from selenium import webdriver

driver = webdriver.Chrome(executable_path='C:\chromedriver')

driver.get('https://card.com')

id = driver.find_elements_by_xpath('//*[@class]')

for ii in id:
    print(ii.get_attribute('class="hilo-history-feed__collection"'))
    
driver.close()

标签: pythonhtmlweb-scraping

解决方案


我设法用这段代码取得了成功


import requests
from bs4 import BeautifulSoup

URL = 'http://www.card.com'
response = requests.get(URL)
soup = BeautifulSoup(response.content, 'html5lib')

for i in soup.find_all('div'): 
    print(i)

感谢所有帮助过的人


推荐阅读