python - 在python中删除重复的url
问题描述
from selenium import webdriver
import time
from bs4 import BeautifulSoup as Bs
driver = webdriver.Chrome(executable_path=r'C:\Users\91901\PycharmProjects\kk\drivers\chromedriver.exe')
# page_no = input('Enter Page Number : ')
page_no = '2'
blog_page = driver.get('https://xmonks.com/blog/page/' + page_no + '/')
time.sleep(1)
driver.execute_script("window.scrollTo(0, 500)")
time.sleep(1)
driver.execute_script("window.scrollTo(0, 1000)")
driver.execute_script("window.scrollTo(0, 1500)")
time.sleep(1)
driver.execute_script("window.scrollTo(0, 2000)")
time.sleep(2)
soup = Bs(driver.page_source, 'html.parser')
time.sleep(3)
link = soup.find('div', {'class': 'exp-grid-wrap'})
lnk = link.find_all('a')
for links in lnk:
ll = links.get('href')
print(ll)
我正在从这个网站获取博客网址,但我得到了一些重复的网址,请帮助我如何删除重复的网址,在此先感谢
解决方案
你只需要找到合适的元素来获得你需要的东西。我使用了a tag
其中包含缩略图的那个并且它起作用了。
这将为您提供输出:
soup = Bs(driver.page_source, 'html.parser')
sleep(3)
a_tags = soup.find_all('a', {'class': 'exp-post-thumb-inner'})
links = [i['href'] for i in a_tags]
for i in links:
print(i)
# output
https://xmonks.com/mindfulness-meditation-coaching/
https://xmonks.com/the-healing-aspects-of-mindfulness-coaching/
https://xmonks.com/techniques-to-be-mindful-a-word-by-and-for-the-coaches/
https://xmonks.com/understanding-mindfulness-coaching/
https://xmonks.com/neuroscience-of-emotions-and-values/
https://xmonks.com/neuroscience-of-beliefs/
https://xmonks.com/coaching-skills-for-leaders/
https://xmonks.com/neuroscience-of-goals/
https://xmonks.com/simplifying-coaching-coaching-matters/
https://xmonks.com/solution-focused-coaching/
推荐阅读
- azure - 我希望只有一个消息使用者中的一个来处理基于 Id 的消息,这可以通过 Azure 服务总线实现吗?
- vue.js - 从服务器为 Nuxt 中的 Axios 注入变量
- javascript - Node.js 路由相关
- flutter - 是否有使用 Dart grpc 的初始超时解决方案?
- sql - postgresqlExecStatement(conn, statement, ...) 中的错误:RS-DBI 驱动程序:(无法检索结果:错误:关系“分析”不存在
- kubernetes - Istio 演示配置文件中的 kiali 仪表板登录失败
- c# - 如何使用 NetworkStream.DataAvailable 属性检测 TCP 连接失败?
- c++ - C++轻量级嵌入式itoa和fptoa实现
- javascript - 如何从字符串创建 React 组件
- sql - SQL server - 如何将一列拆分为多列?