首页 > 解决方案 > 无法从我使用 Puppeteer 导航到的页面中抓取

问题描述

我对 Puppeteer 还很陌生,我正在尝试练习跟踪亚马逊的选定商品。但是,当我尝试从页面中检索一些结果时,我遇到了问题。

我希望这种自动化工作的方式是按照以下步骤操作:

检查以下示例:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({
    headless: false,
  });
  const page = await browser.newPage();
  await page.setRequestInterception(true);


  page.on('request', (req) => {      // don't load any fonts or images on my requests. To Boost the  performance

    if (req.resourceType() == 'font' /* || req.resourceType() == 'image' || req.resourceType() == 'stylesheet'*/) {
      req.abort();
    }
    else {
      req.continue(); {

      }
    }
  });

const baseDomain = 'https://www.amazon.com';

  await page.goto(`${baseDomain}/`, { waitUntil: "networkidle0" });

await page.click("#twotabsearchtextbox" ,{delay: 50})

  await page.type("#twotabsearchtextbox", "Bose QuietComfort 35 II",{delay: 50});
  await page.keyboard.press("Enter");
  await page.waitForNavigation({
    waitUntil: 'networkidle2',
  });

  let productTitle = await page.$$(".a-size-medium, .a-color-base, .a-text-normal")[43]; //varible that holds the title of the product

  console.log(productTitle );

  debugger;

})();

当我执行此代码时,我在 console.log 中为变量 productTitle 获得了一个未定义的值。从导航到的页面中抓取信息时遇到了很多麻烦。我曾经这样做page.evaluate(),并且仅在我从告诉浏览器转到的页面中抓取时才起作用。

标签: javascriptnode.jspuppeteer

解决方案


第一个问题在这一行:

let productTitle = await page.$$(".a-size-medium, .a-color-base, .a-text-normal")[43];
// is equivalent to:
let productTitle = await (somePromise[43]);

// As you guessed it, a Promise does not have a property `43`,
// so I think you meant to do this instead:
let productTitle = (await page.$$(".a-size-medium, .a-color-base, .a-text-normal"))[43];

一旦解决了这个问题,您就不会得到标题文本,而是 DOM 元素的句柄。所以你可以这样做:

let titleElem = (await page.$$(".a-size-medium, .a-color-base, .a-text-normal"))[43];
let productTitle = await titleElem.evaluate(node => node.innerText);

console.log(productTitle); // "Microphone"

但是,我不确定简单地选择第 43 个元素是否总能得到您想要的元素,但如果不是,那将是另一个问题的主题。


推荐阅读