html - 如何获取父类之外的元素
问题描述
我正在尝试从网络中提取一些数据。但是,并非我需要的所有信息都在父类中。我可以在 Parent 类中获取信息。
问题 - 如果数据在父类之外,有没有办法获取数据?或者有没有办法在不使用父类的情况下设置下面的代码来提取。
我正在使用 IE,因为它让我可以搜索该网站。我尝试了几种代码变体,但是,额外的信息不是我要从中提取的父类。
我追求名称、位置和社交媒体链接。位置在班级外的网页顶部
我尝试将以下内容用于父类shop-home
,因为所有其他类都属于它,但它不起作用。我从来没有尝试过获取不在父类中的数据,所以不是 100% 确定如何去做。SIM 对此有所帮助,element.ParentNode.ParentNode.getElementsByClassName
因为产品 url 在父级之前。我一直在尝试将它用于父级之外的所有其他数据,但是我无法让它工作。如果有人可以解释正在做的事情这将有助于我的理解,我不完全理解它.ParentNode.ParentNode.
,我也许可以自己解决其余的问题。
下面的代码是前两个项目的代码,除了它之外,所有项目的代码布局都是相同的If element.getElementsByClassName("CLASS HERE")(0)
。我试过使用ID
Tag
Span
AND SO ONIf element.getElementsByClassName("CLASS HERE")(0).getelementsByTagName ("Span") (0)
Application.ScreenUpdating = False
Set HTML = objIE.document
''''########## Setting the Parent Class HERE ##########
Set elements = HTML.getElementsByClassName("v2-listing-card__info")
''''Scrolls Down the Browser
objIE.document.parentWindow.Scroll 0&, 9999 ' Scrolls Down the Browser
''''FOR LOOP
For Each element In elements
''' Element 1
If element.ParentNode.ParentNode.getElementsByClassName("listing-link")(0) Is Nothing Then
wsSheet.Cells(sht.Cells(sht.Rows.Count, "A").End(xlUp).Row + 1, "A").Value = "-"
Else
HtmlText = element.ParentNode.ParentNode.getElementsByClassName("listing-link")(0).href
wsSheet.Cells(sht.Cells(sht.Rows.Count, "A").End(xlUp).Row + 1, "A").Value = HtmlText
End If
''' Element 2
If element.getElementsByTagName("h3")(0) Is Nothing Then
wsSheet.Cells(sht.Cells(sht.Rows.Count, "B").End(xlUp).Row + 1, "B").Value = "-"
Else
HtmlText = element.getElementsByTagName("h3")(0).innerText ' Get CLASS and Child Nod 'src
wsSheet.Cells(sht.Cells(sht.Rows.Count, "B").End(xlUp).Row + 1, "B").Value = HtmlText 'return value in column
End If
''' Element 3
结果 - 红色日期错误或缺失,因为它不在上述父类中
H 列中的运输与父级中的运输一样顺利,如果没有运输信息,则连字符将进入单元格。C、D、E 的项目不在我正在使用的父类中。
<div class="flex-grow-1">
<div class="max-width-760px ">
</div>
<div class="max-width-676px">
<div class="">
<p class="wt-text-heading-02 wt-display-inline" data-inplace-editable-text="story_headline" data-endpoint="AboutPost" data-key="story_headline" data-placeholder="Sum up what you do in one sentence. Or just write something catchy." data-use-inplace-input="1"
data-add-class="normal story-headline-edit-link"></p>
</div>
<div class="">
<div id="about-story" class="" aria-hidden="false">
<p class="about-story text-body-larger text-gray-lighter ">
<span class="mt-xs-1" data-inplace-editable-text="story" data-endpoint="AboutPost" data-key="story" data-placeholder="How did you get started? What inspires you? We know each seller’s story is unique — tell yours here."></span>
</p>
</div>
<div class="wt-text-center-xs">
</div>
</div>
</div>
<div class="wt-mb-xs-6 wt-mb-md-8">
<div class="clearfix"></div>
<div>
<h3 class="wt-text-title-01"></h3>
<div class="pt-xs-2 pt-lg-4">
<div class="display-flex-md flex-wrap max-width-760px">
<div class="mb-xs-2 text-body mr-md-6">
<a href="https://www.facebook.com/Lucky-Plum-706715642737271/" class="text-decoration-none clearfix" title="Facebook" target="_blank" rel="nofollow noopener">
<span class="etsy-icon"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" aria-hidden="true" focusable="false"><path d="M20,5V19a1.007,1.007,0,0,1-1,1H15V13.776h2l0.336-2.3H15V9.659a0.912,0.912,0,0,1,1-1.031h1.5V6.55a11.284,11.284,0,0,0-1.641-.109c-2.2,0-3.3,1.219-3.3,3.039v1.992h-2v2.3h2V20H5a1.007,1.007,0,0,1-1-1V5A1.007,1.007,0,0,1,5,4H19A1.007,1.007,0,0,1,20,5Z"></path></svg></span>
<span>Facebook</span>
</a>
</div>
<div class="mb-xs-2 text-body mr-md-6">
<a href="https://www.instagram.com/luckyplumstudio/" class="text-decoration-none clearfix" title="Instagram" target="_blank" rel="nofollow noopener">
<span class="etsy-icon"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" aria-hidden="true" focusable="false"><path d="M12,5.447c2.136,0,2.389,0.008,3.233,0.047c0.78,0.036,1.204,0.166,1.485,0.275c0.373,0.145,0.64,0.318,0.92,0.598 c0.28,0.28,0.453,0.546,0.598,0.92c0.11,0.282,0.24,0.706,0.275,1.485c0.038,0.844,0.047,1.097,0.047,3.233 s-0.008,2.389-0.047,3.233c-0.036,0.78-0.166,1.204-0.275,1.485c-0.145,0.373-0.318,0.64-0.598,0.92 c-0.28,0.28-0.546,0.453-0.92,0.598c-0.282,0.11-0.706,0.24-1.485,0.275c-0.843,0.038-1.096,0.047-3.233,0.047 s-2.389-0.008-3.233-0.047c-0.78-0.036-1.204-0.166-1.485-0.275c-0.373-0.145-0.64-0.318-0.92-0.598 c-0.28-0.28-0.453-0.546-0.598-0.92c-0.11-0.282-0.24-0.706-0.275-1.485c-0.038-0.844-0.047-1.097-0.047-3.233 S5.45,9.616,5.488,8.773c0.036-0.78,0.166-1.204,0.275-1.485c0.145-0.373,0.318-0.64,0.598-0.92c0.28-0.28,0.546-0.453,0.92-0.598 c0.282-0.11,0.706-0.24,1.485-0.275C9.611,5.455,9.864,5.447,12,5.447 M12,4.005c-2.173,0-2.445,0.009-3.298,0.048 C7.85,4.092,7.269,4.227,6.76,4.425C6.234,4.63,5.787,4.903,5.343,5.348C4.898,5.793,4.624,6.239,4.42,6.765 c-0.198,0.509-0.333,1.09-0.372,1.942C4.009,9.56,4,9.833,4,12.005c0,2.173,0.009,2.445,0.048,3.298 c0.039,0.852,0.174,1.433,0.372,1.942c0.204,0.526,0.478,0.972,0.923,1.417c0.445,0.445,0.891,0.718,1.417,0.923 c0.509,0.198,1.09,0.333,1.942,0.372c0.853,0.039,1.126,0.048,3.298,0.048s2.445-0.009,3.298-0.048 c0.852-0.039,1.433-0.174,1.942-0.372c0.526-0.204,0.972-0.478,1.417-0.923c0.445-0.445,0.718-0.891,0.923-1.417 c0.198-0.509,0.333-1.09,0.372-1.942C19.991,14.45,20,14.178,20,12.005s-0.009-2.445-0.048-3.298 c-0.039-0.852-0.174-1.433-0.372-1.942c-0.204-0.526-0.478-0.972-0.923-1.417c-0.445-0.445-0.891-0.718-1.417-0.923 c-0.509-0.198-1.09-0.333-1.942-0.372C14.445,4.014,14.173,4.005,12,4.005L12,4.005z"></path><path d="M12,7.897c-2.269,0-4.108,1.839-4.108,4.108S9.731,16.113,12,16.113s4.108-1.839,4.108-4.108S14.269,7.897,12,7.897z M12,14.672c-1.473,0-2.667-1.194-2.667-2.667S10.527,9.339,12,9.339s2.667,1.194,2.667,2.667S13.473,14.672,12,14.672z"></path><circle cx="16.27" cy="7.735" r="0.96"></circle></svg></span>
<span>Instagram</span>
</a>
</div>
</div>
</div>
</div>
</div>
<div class="wt-mb-xs-8 wt-mb-md-10">
<div class="clearfix"></div>
<div class="about-section display-flex-md flex-direction-column-md mb-md-5 pl-xs-0 pr-xs-0" data-region="shop-members" id="shop-members">
<div class="p-xs-0">
<h3 class="wt-text-title-01">Shop members</h3>
</div>
<div class="pl-xs-0 pr-xs-0 pt-xs-2 pt-lg-4">
<div class="max-width-760px">
<ul class="list-unstyled block-grid-md-2" data-region="shop-member-list">
<li class="pt-xs-2 pb-xs-2 block-grid-item" data-region="shop-member" data-member-id="22676501471" data-member-avatar-url="https://i.etsystatic.com/isc/87253d/22676501471/isc_90x90.22676501471_6w54.jpg?version=0" data-member-bio="" data-member-role="Owner"
data-member-name="Lucky Plum Studio">
<div class="flag">
<div class="flag-img vertical-align-top pr-lg-3">
<img src="https://i.etsystatic.com/isc/87253d/22676501471/isc_90x90.22676501471_6w54.jpg?version=0" alt="" class="circle" data-region="member-avatar" width="48" height="48">
</div>
<div class="flag-body">
<h6 class="mb-xs-0 b text-transform-none text-body" data-region="member-name">Lucky Plum Studio</h6>
<p class="prose" data-region="member-role">Owner</p>
<p class="text-gray-lighter mb-xs-0" data-region="member-bio">
</p>
</div>
</div>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="">
</div>
</div>
一如既往地提前感谢
''######### 更新于今天 22/3/2021 英国时间下午 6 点#########
回复 Qharr 的回答。我有这个位置,没有收集任何东西,请你解释我哪里出错了,我应该能够解决其余的问题
''' Element 4
DoEvents
If element.getElementsByClassName("shop-location")(0).getElementsByTagName("Span")(0) Is Nothing Then ' Get CLASS and Child Nod
wsSheet.Cells(sht.Cells(sht.Rows.Count, "D").End(xlUp).Row + 1, "D").Value = "-"
Else
HtmlText = element.getElementsByClassName("shop-location")(0).getElementsByTagName("Span")(0).innerText
wsSheet.Cells(sht.Cells(sht.Rows.Count, "D").End(xlUp).Row + 1, "D").Value = HtmlText
End If
解决方案
除了阅读 html 和 html 文档方法/css 选择器以便您了解需要应用的模式之外,不确定该说什么。剩下的只是练习和学习,这是最快和更健壮的方法。
CSS:
位置:
.shop-location span
是span
具有父类的子元素shop-location
社交媒体链接:
#about .text-decoration-none
具有一个类名称的子节点,即text-decoration-none
具有 id 的父节点about
。名称:具有属性值
[data-region='member-name']
的元素data-region
member-name
在此处阅读有关 CSS 选择器和后代组合器的信息
在此处练习 CSS 选择器
在此处了解 html
VBA:
Option Explicit
Public Sub GetInfo()
Dim ie As SHDocVw.InternetExplorer
Set ie = New SHDocVw.InternetExplorer
With ie
.Visible = True
.Navigate2 "https://www.etsy.com/uk/shop/LuckyPlumStudio"
While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
With .document
Debug.Print .querySelector(".shop-location span").innerText 'location
Dim i As Long, socialMedias As Object
Set socialMedias = .querySelectorAll("#about .text-decoration-none")
For i = 0 To socialMedias.Length - 1 'media links
Debug.Print socialMedias.Item(i).href
Next
Debug.Print .querySelector("[data-region='member-name']").innerText 'company name
End With
.Quit
End With
End Sub
不太理想的选择方法:
Option Explicit
Public Sub GetInfo()
Dim ie As SHDocVw.InternetExplorer
Set ie = New SHDocVw.InternetExplorer
With ie
.Visible = True
.Navigate2 "https://www.etsy.com/uk/shop/LuckyPlumStudio"
While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
With .document
Debug.Print .getElementsByClassName("shop-location wt-display-flex-xs")(0).getElementsByTagName("span")(0).innerText 'location
Dim i As Object, socialMedias As Object
Set socialMedias = .getElementById("about").getElementsByClassName("text-decoration-none clearfix")
For Each i In socialMedias 'media links
Debug.Print i.href
Next
Debug.Print .getElementById("about").getElementsByClassName("flag")(0).getElementsByTagName("h6")(0).innerText 'company name
End With
.Quit
End With
End Sub
推荐阅读
- sql - SQL SERVER中的select查询需要支持
- ios - Swift:如何为其委托创建一个具有泛型类型的自定义覆盖 UITableView?
- powershell - 映射后在资源管理器中打开驱动器
- r - 组间相关性
- html - 有条件地链接到其他 html 页面
- sorting - 在 Excel 单元格中,为什么 ="P10-12" < "P1-9" 给我 TRUE?
- oracle12c - 不匹配时如何隐藏值
- python - 当某个应用程序在 Python 中启动时如何触发函数
- kubernetes - 根据 SqlClient 的验证程序,远程证书无效
- c# - DOCKER 错误“您的意思是运行 dotnet SDK 命令吗?请从以下位置安装 dotnet SDK:https://go.microsoft.com/fwlink/?LinkID=798306&clcid=0x409”