首页 > 解决方案 > Puppeter 将页面评估的返回值保存为 csv

问题描述

我目前有一个 puppeteer 脚本,它通过获取页​​面上所有 href 的数组、遍历每个 href 链接,然后从每个相应的 href 链接获取数据来从网站获取信息。

使用 page.evaluate 函数后,我能够通过 return 返回所有所需的值。

如何将返回值写入 csv?

这是我的脚本:

const stealth = require('puppeteer-extra-plugin-stealth')();
 


const hrefsCategoriesDeduped = new Set(await page.evaluate(
    () => Array.from(
      document.querySelectorAll('.b-shared-linked-pharmacy__box a[href].b-shared-linked-pharmacy__details'),
      a => a.href
    )
  ));


  let pages = [];

console.log (hrefsCategoriesDeduped)

for (const url of hrefsCategoriesDeduped) {

    await page.goto(url);
   // await page.waitFor(10000)
    let telusData = {   
      name: "",
      address: "",
      city: "",
      province: "",
      postal: "",
      fax: "",
      pharmacyphone: "",
      pharmacyfax: "",
      pharmacyemail: "",
      pharmacywebsite: ""
    };
    await page.waitForSelector('.b-pharmacy-detail__details');
    //let pharmacycomplete = []

    telusData =  await page.evaluate(() => {
 let pharmacyname  = document.querySelector('h2[class="b-pharmacy-detail__name"]').innerText
 let pharmacystreet = Array.from(document.querySelectorAll(".b-pharmacy-detail__info"), element => element.textContent)
 let pharmacyphoneandfax = Array.from(document.querySelectorAll(".b-pharmacy-detail__contact"), element => element.textContent)

var variable_names = {};
for(var i = 0; i< pharmacystreet.length; i++){
  variable_names['na_'+i] = pharmacystreet[i];
}


return {
  name: pharmacyname,
  address: pharmacyaddress,
  city1: city[1],
 province: province[1],
 postal: postal[1],
 pharmacyphone: pharmacyphone,
 pharmacyfax: pharmacyfax,
 pharmacyemail: pharmacyemail,
 pharmacywebsite: pharmacywebsite,

 }
});

console.log(telusData)

  }



    await browser.close();
  } catch (err) {
    console.error(err);
  }
})();

现在,我使用 csv 的所有标头声明变量 TelusData,然后进行页面评估以接收标头的值和相应的值。我回来 :

return {
  name: pharmacyname,
  address: pharmacyaddress,
  city1: city[1],
 province: province[1],
 postal: postal[1],
 pharmacyphone: pharmacyphone,
 pharmacyfax: pharmacyfax,
 pharmacyemail: pharmacyemail,
 pharmacywebsite: pharmacywebsite,

 }

对于每个“telusdata”,我如何将数据的值也保存到带有标题的 csv 中?非常感谢任何帮助。

标签: javascriptnode.jscsvpuppeteer

解决方案


我已经减少了你的代码,因为有很多与你的具体问题无关。希望这可以为您指明正确的方向。


(async function main() {
    try {
        // create array to store all the data
        let telusDataArray = [];

        for (const url of hrefsCategoriesDeduped) {
            telusData = await page.evaluate(() => {
                return {
                    name: pharmacyname,
                    address: pharmacyaddress,
                    city1: city[1],
                    province: province[1],
                    postal: postal[1],
                    pharmacyphone: pharmacyphone,
                    pharmacyfax: pharmacyfax,
                    pharmacyemail: pharmacyemail,
                    pharmacywebsite: pharmacywebsite,
                }
            });
            telusDataArray.push(telusData); // add each url's data to the array
        }

        // at this point telusDataArray should be filled and ready for CSV generation

        await browser.close();
    }
})();

推荐阅读