首页 > 解决方案 > TypeError:无法读取 Node.js 和 puppeteer 中未定义的属性“匹配”

问题描述

我正在尝试过滤包含一堆 url 的数组。我需要返回仅包含“媒体发布”一词的网址。它目前只是发回错误。虽然我尝试删除我的package-lock.json,但它仍然不起作用。

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://www.cbp.gov/newsroom/media-releases/all');
  const data = await page.evaluate(() => {
    const nodeList = document.getElementsByClassName('survey-processed');
    const urls = [];

    for (i=0; i<nodeList.length; i++) {
      urls.push(document.getElementsByClassName('survey-processed')[i].href);
    }
    const regex = new RegExp('/media-release\\b', 'g');
    const links = urls.filter(element => element.match(regex));
    return links;
  });
  console.log(data);
  await browser.close();
})();

错误(节点:10208)UnhandledPromiseRejectionWarning:错误:评估失败:TypeError:无法读取 puppeteer_evaluation_script 处未定义的属性“匹配” :11:50 at Array.filter () at puppeteer_evaluation_script : 11 :24 at ExecutionContext._evaluateInternal (C:\Users \Documents\\node_modules\puppeteer\lib\cjs\puppeteer\common\ExecutionContext.js:217:19) 在 processTicksAndRejections (internal/process/task_queues.js:86:5)

标签: javascriptnode.jspuppeteer

解决方案


检查页面后,我发现了一些具有类的元素survey-processed不是a元素(两种形式:form#search-block-form.survey-processedform#views-exposed-form-newsroom-page.survey-processed)。

form元素没有href属性,因此它将是undefined,这就是导致错误的原因。

要解决此问题,您必须更具体地选择元素,使用querySelectorAll此选择器"a.survey-processed",如下所示:

const data = await page.evaluate(() => {
    const nodeList = document.querySelectorAll("a.survey-processed");  // get only <a> elements that have the classname 'survey-processed'
    const urls = [];

    for (let i = 0; i < nodeList.length; i++) {                        // for each one of those
        if(/\/media-release\b/.test(nodeList[i].href)) {               // if the 'href' attribute matches the regex (use 'test' here rather than 'match')
            urls.push(nodeList[i].href);                               // push the 'href' attribute to the array
        }
    }

    return urls;
});

此外,如果您只查找包含短语的 url "/media-release",您可以使用 CSS 的属性 contains 选择器[attribute*=value]来进一步缩短代码,如下所示:

const data = await page.evaluate(() => {
    const nodeList = document.querySelectorAll('a.survey-processed[href*="/media-release"]');  // get only <a> elements that have the classname 'survey-processed' and whose 'href' attribute contains the phrase "/media-release"
    return Array.from(nodeList).map(element => element.href);  // convert the NodeList into an array and use 'map' to get the 'href' attributes
});

推荐阅读