首页 > 解决方案 > 如何在url中抓取

问题描述

我想抓取多个网址

const puppeteer = require('puppeteer');

let scrape = async () => {
    const browser = await puppeteer.launch({headless: false,userDataDir: "./user_data"});

   let elements = ['https://tr.pinterest.com/gamzeeerkek','https://tr.pinterest.com/jislaynekauany_']
   const result = await page.evaluate(() => {     

    for(let url of elements)
    {
        let page = await browser.newPage();
        await page.goto(url);
        await page.waitFor(1000);
            let title = document.querySelector('.lH1').innerText;
            let title1 = document.getElementsByClassName('tBJ')[1].innerText; 

            data.push({title, title1});

    }
    return data; // Return our data array
    });

    browser.close();
    return result; // Return the data

};

scrape().then((value) => {
    console.log(value); // Success!
});

我的错误是:

let page = await browser.newPage(); ^^^^^

SyntaxError: await 仅在异步函数中有效

标签: node.jspuppeteer

解决方案


您看到此错误是因为您使用的是await外部async. async此外,由于上下文和语法错误,您有几个错误,即使您添加关键字,您的脚本也将无法工作。

这是脚本:

const scrape = async () => {
const browser = await puppeteer.launch({headless: false,userDataDir: "./user_data"});

const data = [];
const urls = ['https://tr.pinterest.com/gamzeeerkek','https://tr.pinterest.com/jislaynekauany_']
const page = await browser.newPage();

for (const url of urls) {
  await page.goto(url);
  await page.waitForSelector('.lH1');

  const result = await page.evaluate(() => {
    const title = document.querySelector('.lH1').innerText;
    const title1 = document.getElementsByClassName('tBJ')[1].innerText; 

    return {title, title1};
  });

  data.push(result)
}
 await browser.close();

 return data; // Return the data
};

scrape().then((value) => {
 console.log(value); // Success!
});

输出:

[ {标题:'Gamze Erkek',标题1:'关注'},{标题:'Jislayne',标题1:'关注'}]


推荐阅读