首页 > 解决方案 > 当父母相似但不一样时如何刮

问题描述

如果父母的名字不一样,你会如何刮掉这个网站的标题和链接?

在此处输入图像描述

例如,从截图中可以看到,第一个标题和链接在 div class="slot type-post type-order-1" 中。对于第二个标题和链接,它们在 div class="slot type-post type-order-2" 内,依此类推。

该网站是https://thechive.com/

如果没有解决方案,我会有一个很长的代码,这似乎没有意义:

content1 = soup.find_all('div', class_='slot type-post type-order-1')
content2 = soup.find_all('div', class_='slot type-post type-order-2')

for contents in content1:
    title1 = contents.find('h3', class_='post-title entry-title card-title').text
    link1 = contents.h3.a['href']
    print(title1)
    print(link1)

for content in content2:
    title2 = content.find('h3', class_='post-title entry-title card-title').text
    link2 = content.h3.a['href']
    print(title2)
    print(link2)

标签: web-scraping

解决方案


您可以使用该select方法使用 css 选择器。

soup.select('div[class*="slot type-post type-order-"]')

*= 代表Contains. _

参考:

代码:

import requests
from bs4 import BeautifulSoup
r = requests.get('https://thechive.com/')
soup = BeautifulSoup(r.text, 'html.parser')
for content in soup.select('div[class*="slot type-post type-order-"]'):
    title = content.find('h3', class_='post-title entry-title card-title').text
    link = content.h3.a['href']
    print(title)
    print(link)

输出:

GAPs can help keep you warm through this winter freeze (45 Photos)
https://thechive.com/2021/02/15/gaps-can-help-keep-you-warm-through-this-winter-freeze/
Texans REALLY do not know how to handle a little snow (20 Photos)
https://thechive.com/2021/02/15/texans-really-do-not-know-how-to-handle-a-little-snow-20-photos/
...

推荐阅读