python - Python,从 DOM 中提取相同的元素并制作一个列表
问题描述
这是一种在 Python 中制作脚本来检查 HTML 代码中是否有以“abc=”开头的项目并列出它们的方法吗?例如,我有一个包含网格中元素列表的代码:
<div data-componentid="fa-gamepad" class="x-component x-button x-has-icon mainTopMenuItem
widerIcon x-icon-align-left x-arrow-align-right x-layout-box-item x-layout-hbox-item"
data-xid="237" data-exttouchaction="11" id="fa-gamepad" senchatest="mainMenu_game">
<div class="x-inner-el" id="ext-element-877">
<div class="x-body-el" id="ext-element-876" style="padding: 0px !important;">
<div class="x-icon-el x-font-icon x-fa fa-gamepad" id="ext-element-878">
</div>
<div class="x-text-el">
</div>
</div>
<div class="x-arrow-el x-font-icon">
</div>
</div>
<div class="x-badge-el">
</div>
<button class="x-button-el" type="button" id="ext-element-875" data-componentid="fa-gamepad">
</button>
</div>
<div data-componentid="plus" class="x-component x-button x-has-icon mainTopMenuItem x-icon-align-left x-arrow-align-right x-has-menu x-layout-box-item x-layout-hbox-item" data-xid="238" data-exttouchaction="11" id="plus" senchatest="mainMenu_plus">
<div class="x-inner-el" id="ext-element-881">
<div class="x-body-el" id="ext-element-880" style="padding: 0px !important;">
<div class="x-icon-el x-font-icon x-fa fa-plus" id="ext-element-882"></div><div class="x-text-el">
</div>
</div>
<div class="x-arrow-el x-font-icon">
</div>
</div>
<div class="x-badge-el">
</div>
<button class="x-button-el" type="button" id="ext-element-879" data-componentid="plus">
</button>
</div>
在上面的代码中,我有两个以“senchatest=”开头的元素,现在我希望 Python 找到这些元素并列出它们,如下所示:
senchatest="mainMenu_game"
senchatest="mainMenu_plus"
在我的 HTML 代码中,我有 > 300 个这样的元素,我需要列出它们以进行测试。
解决方案
我们可以使用 Beautiful Soup,它是一个用于从 HTML 和 XML 文件中提取数据的 Python 库。
# Importing BeautifulSoup class from the bs4 module
from bs4 import BeautifulSoup
import re
# Opening the html file(test.html contains the code snippet shared in the question)
HTMLFile = open("test.html", "r")
# Reading the file
index = HTMLFile.read()
# Creating a BeautifulSoup object and specifying the parser
S = BeautifulSoup(index, 'lxml')
#list to hold the values
l=[]
#find all 'div' tags
tag_name = S.find_all('div')
for tag in tag_name:
#search for 'senchatest' in tags within div
if 'senchatest' in str(tag):
tag=str(tag)
#split the tag at 'senchatest'
x = tag.partition("senchatest=")[2]
#extract the value after "senchatest="
x = x.split("\"")[1]
#append to list
l.append(x)
#To list as them , as you have mentioned in your expected output
for i in l:
print("senchatest=" +"\""+i+"\"")
输出是:
senchatest="mainMenu_game"
senchatest="mainMenu_plus"
推荐阅读
- rabbitmq - “目录名称无效。” 等在 Windows 上使用 rabbitmq-plugins
- javascript - 用变量替换括号之间的字符串
- python - 使用 Python 下载 CVS “下载 CSV” 按钮
- ios - 从自定义 UITableViewCells 中读取已编辑的 UITextFields
- python - Pygame.update 不会在每一轮后更新
- javascript - Can't perform a React state update on an unmounted component 警告
- c - 有没有办法检查文件系统在预处理器中是否不区分大小写?
- amazon-web-services - 了解 AWS 访问密钥的允许操作策略列表
- sql - 将 FOR XML 拆分为 baches
- python - 为什么我的 websocket 服务器只接受 1 个连接