首页 > 解决方案 > Python,从 DOM 中提取相同的元素并制作一个列表

问题描述

这是一种在 Python 中制作脚本来检查 HTML 代码中是否有以“abc=”开头的项目并列出它们的方法吗?例如,我有一个包含网格中元素列表的代码:

<div data-componentid="fa-gamepad" class="x-component x-button x-has-icon mainTopMenuItem
widerIcon x-icon-align-left x-arrow-align-right x-layout-box-item x-layout-hbox-item" 
data-xid="237" data-exttouchaction="11" id="fa-gamepad" senchatest="mainMenu_game">
<div class="x-inner-el" id="ext-element-877">
<div class="x-body-el" id="ext-element-876" style="padding: 0px !important;">
<div class="x-icon-el x-font-icon x-fa fa-gamepad" id="ext-element-878">
</div>
<div class="x-text-el">
</div>
</div>
<div class="x-arrow-el x-font-icon">
</div>
</div>
<div class="x-badge-el">
</div>
<button class="x-button-el" type="button" id="ext-element-875" data-componentid="fa-gamepad">
</button>
</div> 
<div data-componentid="plus" class="x-component x-button x-has-icon mainTopMenuItem x-icon-align-left x-arrow-align-right x-has-menu x-layout-box-item x-layout-hbox-item" data-xid="238" data-exttouchaction="11" id="plus" senchatest="mainMenu_plus">
<div class="x-inner-el" id="ext-element-881">
<div class="x-body-el" id="ext-element-880" style="padding: 0px !important;">
<div class="x-icon-el x-font-icon x-fa fa-plus" id="ext-element-882"></div><div class="x-text-el">
</div>
</div>
<div class="x-arrow-el x-font-icon">
</div>
</div>
<div class="x-badge-el">
</div>
<button class="x-button-el" type="button" id="ext-element-879" data-componentid="plus">
</button>
</div>

在上面的代码中,我有两个以“senchatest=”开头的元素,现在我希望 Python 找到这些元素并列出它们,如下所示:

senchatest="mainMenu_game"
senchatest="mainMenu_plus"

在我的 HTML 代码中,我有 > 300 个这样的元素,我需要列出它们以进行测试。

标签: pythonautomation

解决方案


我们可以使用 Beautiful Soup,它是一个用于从 HTML 和 XML 文件中提取数据的 Python 库。

# Importing BeautifulSoup class from the bs4 module
from bs4 import BeautifulSoup
import re
  
# Opening the html file(test.html contains the code snippet shared in the question)
HTMLFile = open("test.html", "r")
  
# Reading the file
index = HTMLFile.read()
  
# Creating a BeautifulSoup object and specifying the parser
S = BeautifulSoup(index, 'lxml')

#list to hold the values
l=[]

#find all 'div' tags
tag_name = S.find_all('div')
for tag in tag_name:
    #search for 'senchatest' in tags within div 
    if 'senchatest' in str(tag):
        tag=str(tag)
        #split the tag at 'senchatest'
        x = tag.partition("senchatest=")[2]
        #extract the value after "senchatest="
        x = x.split("\"")[1]
        #append to list
        l.append(x)

#To list as them , as you have mentioned in your expected output
for i in l:
    print("senchatest=" +"\""+i+"\"")

输出是:

senchatest="mainMenu_game"
senchatest="mainMenu_plus"

推荐阅读