excel - 在 Web Scraping 时加载相关的下拉选项
问题描述
我正在尝试从以下网站抓取数据:http: //www.equibase.com/stats/View.cfm? tf=meet &tb=jockey&rbt=TB
我希望 VBA 代码执行以下步骤:
- 转到网址
- 点击“骑师”
- 从下拉列表中选择一个曲目。说,选择“ALBUQUERQUE”
- 根据所选曲目,页面会加载“Available Meets”下拉菜单。
现在我想从此下拉列表中选择第一次见面
我的代码从第一个下拉列表中选择值“ALBUQUERQUE”,但没有在第二个下拉列表中加载数据。
Sub extract()
Dim ie As New InternetExplorer
Dim doc As New HTMLDocument
Dim optionText As String
optionText = "ALBUQUERQUE"
ie.Visible = True
Url = "http://www.equibase.com/stats/View.cfm?tf=meet&tb=jockey&rbt=TB"
ie.Navigate Url
Application.StatusBar = "Navigating to URL..."
Do
DoEvents
Loop Until ie.ReadyState = READYSTATE_COMPLETE
Do While ie.Busy
DoEvents
Loop
Set doc = ie.Document
Set jockeyButton = doc.getElementsByClassName("scMainTab")
For Each Button In jockeyButton
If Button.getAttribute("href") = "#jockey" Then
Button.Click
Exit For
End If
Next Button
Set tracksDropdown = doc.getElementById("selAvailTracks")
''AT THIS POINT, IT SHOULD AUTOMATICALLY LOAD THE SECOND DROP DOWN BUT IT IS NOT HAPPENING
ie.Quit
Set ie = Nothing
End Sub
如何从第二个下拉列表中选择第一项?
解决方案
神奇的词是“html事件”。要使下拉菜单中的选择生效,必须触发其更改事件。否则什么都不会发生。
您不能将“ALBUQUERQUE”放在第一个下拉列表中。“ALBUQUERQUE”的值为“ALB:USA”
<select id="selAvailTracks" name="selAvailTracks" class="scTrackSelects">
<option value=""> Available Tracks </option>
<option value="ALB:USA">ALBUQUERQUE</option>
<option value="AQU:USA">AQUEDUCT</option>
<option value="ARP:USA">ARAPAHOE PARK</option>
<option value="AZD:USA">ARIZONA DOWNS</option>
<option value="AP :USA">ARLINGTON</option>
<option value="ASD:CAN">ASSINIBOIA DOWNS</option>
<option value="ATO:USA">ATOKAD DOWNS</option>
<option value="BEL:USA">BELMONT PARK</option>
...
...
...
另一种选择方法是所需元素的索引。这用于下拉编号。2.
尝试使用此宏进行选择,包括下拉 2:
Sub Extract()
'Declare all variables
Dim url As String
Dim browser As Object
Dim htmlDoc As Object
Dim nodeTracksDropdown As Object
Dim dateDropdown As Object
Dim trackInDropdown As String
'Initialize variables
trackInDropdown = "ALB:USA" 'You can also get this from a cell of a table
url = "http://www.equibase.com/stats/View.cfm?tf=meet&tb=jockey&rbt=TB"
'Initialize Internet Explorer, set visibility,
'call URL and wait until page is fully loaded
Set browser = CreateObject("internetexplorer.application")
browser.Visible = True
browser.navigate url
Do Until browser.ReadyState = 4: DoEvents: Loop
'Short break to load dynamic content
Application.Wait (Now + TimeSerial(0, 0, 3))
'Shortening document reference
Set htmlDoc = browser.document
'Get first dropdown, select track, trigger change event
'and wait a second to set up the second dropdown
Set nodeTracksDropdown = htmlDoc.getElementById("selAvailTracks")
nodeTracksDropdown.Value = trackInDropdown
Call TriggerEvent(htmlDoc, nodeTracksDropdown, "change")
Application.Wait (Now + TimeSerial(0, 0, 1))
'Get second dropdown, select second entry, trigger change event
'and wait a second to set up the following elements
Set dateDropdown = htmlDoc.getElementById("selAvailRaceMeets")
dateDropdown.selectedIndex = 1
Call TriggerEvent(htmlDoc, dateDropdown, "change")
Application.Wait (Now + TimeSerial(0, 0, 1))
'Do whatever you want here
'...
'...
'...
'Clean up
'browser.Quit
'Set browser = Nothing
'Set nodeTracksDropdown = Nothing
'Set dateDropdown = Nothing
End Sub
此过程触发 html 事件:
Private Sub TriggerEvent(htmlDocument As Object, htmlElementWithEvent As Object, eventType As String)
Dim theEvent As Object
htmlElementWithEvent.Focus
Set theEvent = htmlDocument.createEvent("HTMLEvents")
theEvent.initEvent eventType, True, False
htmlElementWithEvent.dispatchEvent theEvent
End Sub
推荐阅读
- c# - 在 System.Timers.Timer 中获取它的“父级”
- python - ImportError: cannot import name 'imread' from 'scipy.misc' 当我成功安装 Pillow 时仍然发生
- java - 如何从 json 文件中读取所有 JSON 对象?爪哇
- c - GCC 编译器给了我关于 true 未定义等错误
- python - 混合两个相同长度的列表python
- angular - Angular 正则表达式电子邮件域接受和不接受验证
- amazon-web-services - Traget Group 因 503 错误而耗尽,并且无法通过 AWS 中的域名访问端口 3001
- reactjs - 如何在 mui DatePicker 中使用 react-hook-form “注册”
- ruby-on-rails - 运行 gitlab-ctl reconfigure Errno::ENOENT 时出错:没有这样的文件或目录 @ rb_check_realpath_internal
- reactjs - 使用 MUI 版本 5 对应用程序打字稿进行主题化