首页 > 解决方案 > 产品价格和规格的网络抓取

问题描述

我是 VBA 世界的新手,我试图创建一个 Excel 文件以在网站上获得最便宜的价格和名称标签。我创建了一个文件,它遍历整个搜索列表并收集每个产品 URL 并放置在电子表格中。我现在的挑战是使第二个代码工作。它开始毫无问题地获取信息,但是在 10 个或更多 URL 之后,宏给出了一个错误,并开始在所有以下单元格上重复信息。有没有办法让代码运行得更慢,所以它不会不解决这个问题?

我将在这里列出我的代码和我正在抓取的 URL 示例。

这是我一直在使用的代码:

Sub test()
Dim URL As String

Set ie = CreateObject("internetexplorer.application")

For i = 2 To 300

URL = Cells(i, 1).Value

ie.navigate URL
ie.Visible = False

Do While ie.busy And ie.readystate <> "readystate_complete"
DoEvents
Loop

Cells(i, 3) = ie.document.getElementsByTagName("h1")(0).innerText
Cells(i, 4) = ie.document.getElementsByTagName("strong")(0).innerText
Next i

ie.Quit

MsgBox "acabou"

End Sub

<https://www.zoom.com.br/notebook/notebook-vaio-vjf157f11x-b0211s-intel-core-i5-8250u-15-6-8gb-ssd-256-gb-windows-10-8-geracao>
<https://www.zoom.com.br/notebook/notebook-samsung-chromebook-plus-intel-celeron-3965y-12-2-4gb-emmc-32-gb-chrome-os-touchscreen>
<https://www.zoom.com.br/notebook/notebook-dell-xps-7390-intel-core-i7-10710u-13-3-16gb-ssd-512-gb-windows-10-touchscreen>
<https://www.zoom.com.br/notebook/notebook-dell-i15-3583-a5-intel-core-i7-8565u-15-6-8gb-hd-2-tb-windows-10-8-geracao>
<https://www.zoom.com.br/notebook/notebook-lenovo-b330-intel-core-i5-8250u-15-6-4gb-hd-1-tb-windows-10-8-geracao>
<https://www.zoom.com.br/notebook/notebook-dell-i15-7580-a20-intel-core-i7-8550u-15-6-8gb-hd-1-tb-geforce-mx150-windows-10>
<https://www.zoom.com.br/notebook/notebook-dell-i14-3480-u30-intel-core-i5-8265u-14-4gb-hd-1-tb-linux-8-geracao>
<https://www.zoom.com.br/notebook/macbook-pro-apple-muhn2bz-intel-core-i5-13-3-8gb-ssd-128-gb-tela-de-retina>
<https://www.zoom.com.br/notebook/notebook-multilaser-pc150-amd-a4-9120-14-2gb-emmc-32-gb-windows-10>
<https://www.zoom.com.br/notebook/notebook-samsung-np930qaa-kw1br-intel-core-i7-8550u-13-3-8gb-ssd-256-gb-windows-10-touchscreen>
<https://www.zoom.com.br/notebook/notebook-acer-a515-51g-58vh-intel-core-i5-7200u-15-6-8gb-hd-1-tb-geforce-940mx>
<https://www.zoom.com.br/notebook/notebook-multilaser-pc222-intel-celeron-dual-core-13-3-4gb-emmc-64-gb-windows-10>
<https://www.zoom.com.br/notebook/notebook-acer-pt515-51-788a-intel-core-i7-9750h-15-6-32gb-ssd-1-tb-geforce-rtx-2070-windows-10>
<https://www.zoom.com.br/notebook/notebook-acer-a315-53-53ak-intel-core-i5-7200u-15-6-4gb-hd-1-tb-windows-10-7-geracao>
<https://www.zoom.com.br/notebook/notebook-dell-i15-5584-m40-intel-core-i7-8565u-15-6-8gb-hd-2-tb-geforce-mx130-windows-10>
<https://www.zoom.com.br/notebook/notebook-acer-a315-41g-r21b-amd-ryzen-5-2500u-15-6-8gb-hd-1-tb-radeon-535-windows-10>
<https://www.zoom.com.br/notebook/notebook-positivo-master-n2140-intel-core-i3-7020u-14-4gb-hd-500-gb-windows-10-7-geracao>
<https://www.zoom.com.br/notebook/notebook-multilaser-pc101-intel-atom-14-1gb-ssd-32-gb-windows-10>
<https://www.zoom.com.br/notebook/notebook-lenovo-b330-intel-core-i5-8250u-15-6-8gb-hd-1-tb-windows-10-8-geracao>
<https://www.zoom.com.br/notebook/notebook-acer-an515-51-77fh-intel-core-i7-7700hq-15-6-8gb-hd-1-tb-geforce-gtx-1050-windows-10>
<https://www.zoom.com.br/notebook/notebook-dell-i15-3583-a2yp-intel-core-i5-8265u-15-6-4gb-optane-16-gb-hd-1-tb-windows-10>
<https://www.zoom.com.br/notebook/notebook-asus-g531gt-intel-core-i7-9750h-15-6-16gb-ssd-512-gb-geforce-gtx-1650-windows-10>
<https://www.zoom.com.br/notebook/notebook-vaio-fit-15s-intel-core-i3-7100u-15-6-4gb-hd-1-tb-windows-10-home>
<https://www.zoom.com.br/notebook/notebook-samsung-s50-intel-core-i7-7500u-13-3-8gb-ssd-256-gb-windows-10-style>
<https://www.zoom.com.br/notebook/notebook-lenovo-b330-intel-core-i3-7020u-15-6-4gb-ssd-120-gb-windows-10-7-geracao>

标签: excelvbaweb-scraping

解决方案


首先:
始终声明所有变量。要强制执行此操作,请始终写入Option Explicit每个模块的第一行。这样,编译器会立即检测到变量名中的拼写错误。

以下是解决您的问题的方法:
IE 有时是真正的女主角。例如,它不喜欢在同一实例中快速连续地处理 URL。因此,建议将其踢出内存并为每个新 URL 重新启动它。

要快速重启它,必须不要在其设置中设置删除 cookie、缓存等。否则会出现自动化错误。

试试这个宏。使用给定的 URL,它可以工作:

Option Explicit

Sub test()

Dim URL As String
Dim ie As Object
Dim i As Long

  For i = 2 To 300

    If i > 14 Then
      ActiveWindow.SmallScroll down:=1
    End If

    URL = ActiveSheet.Cells(i, 1).Value

    Set ie = CreateObject("internetexplorer.application")
    ie.navigate URL
    ie.Visible = False
    Do While ie.readystate <> 4: DoEvents: Loop

    ActiveSheet.Cells(i, 3) = ie.document.getElementsByTagName("h1")(0).innerText
    ActiveSheet.Cells(i, 4) = ie.document.getElementsByTagName("strong")(0).innerText

    ie.Quit
    Set ie = Nothing
  Next i

  MsgBox "acabou"
End Sub

推荐阅读