python - 提取href链接
问题描述
我编写了一个 python 代码,通过传递 url 来提取没有 https 链接的 href 值。
from BeautifulSoup import BeautifulSoup
import urllib2
import re
html_page = urllib2.urlopen("http://kteq.in/index")
soup = BeautifulSoup(html_page)
for link in soup.findAll('a'):
if link.get('href')==None:
continue
result = re.sub(r"http\S+", "", link.get('href'))
print result
当我运行上面的代码时,我提取了除 https 链接之外的所有 href 链接。我得到了以下输出。
index
index
#
solutions#internet-of-things
solutions#online-billing-and-payment-solutions
solutions#customer-relationship-management
solutions#enterprise-mobility
solutions#enterprise-content-management
solutions#artificial-intelligence
solutions#b2b-and-b2c-web-portals
solutions#robotics
solutions#augement-reality-virtual-reality
solutions#azure
solutions#omnichannel-commerce
solutions#document-management
solutions#enterprise-extranets-and-intranets
solutions#business-intelligence
solutions#enterprise-resource-planning
services
clients
contact
#
#
#
#myCarousel
#myCarousel
#
#
#
#
#
#
#
#
#
#
#step1
#step2
AndroidAppDevelopment
contact
solutions
contact
index
services
#
contact
#
iOSDevelopmentServices
AndroidAppDevelopment
WindowsAppDevelopment
HybridSoftwareSolutions
CloudServices
HTML5Development
iPadAppDevelopment
services
services
services
services
services
services
contact
contact
contact
contact
contact
#
#
#
#
现在,我需要提取上面这些输出链接中存在的 href 链接。例如,我需要从上面的输出中提取“索引”内的链接。请建议我获取输出。
解决方案
推荐阅读
- xcode - xcodebuild 中的默认方案是什么?
- c# - 连接失败并出现错误 91 (Novell.Directory.Ldap.NETStandard)
- javascript - 将异步函数作为回调传递会导致错误堆栈跟踪丢失
- c# - UniWebView 预制件未显示
- ios - 更改 UIView 的边界原点时约束未更新
- angular - Angular 组件路由中的挑战
- typescript - 创建帮助器以测试非空值
- python - 如何将列表列表写入带有颜色的excel文件
- python-3.x - 根据日期选择列
- mysql - 交响乐;SQLSTATE[HY000] [2002] 使用 127.0.0.1 作为 database_host 时没有这样的文件或目录