首页 > 解决方案 > 我想将所有结果添加到 csv,但它只添加第一页结果

问题描述

我对编码很陌生。目前我正在学习一些网络抓取教程。我的代码中有一些问题。我想将所有结果添加到 csv,但它只添加第一页结果。

const rp = require('request-promise');
const otcsv = require('objects-to-csv');
const cheerio = require('cheerio');
const { Console } = require('console');

const baseUrl = ['https://clutch.co/directory/mobile-application-developers?page=1','https://clutch.co/directory/mobile-application-developers?page=2'];
const getCompanies = async () => {
    for (n = 0; n < baseUrl.length; n++) {
        const html = await rp(baseUrl[n]);
        const businessMap = cheerio('h3 > a', html).map(async (i, e) => {
            const link = "https://clutch.co" + e.attribs.href;
            const innerHtml = await rp(link);
            let title = cheerio('.h2_title', innerHtml).text().replace(/\s\s+/g, ' ');
            let rating = cheerio('#summary_section > div > div.col-md-6.summary-description > div:nth-child(2) > div.col-md-9 > div > span', innerHtml).text();
            let name = e.children[0].data;
            //console.log(name)

            return {
                title,
                rating,
                name
            }
        }).get();
        return Promise.all(businessMap);
    }

};
  getCompanies()
  .then(result => {
    const transformed = new otcsv(result);
    return transformed.toDisk('./output.csv');
  })
  .then(() => console.log('Done')); 

还有一个问题:谁能解释一下这一行

const businessMap = cheerio('h3 > a', html).map(async (i, e) => {

提前致谢

标签: javascripthtmlcsvweb-scrapingexport-to-csv

解决方案


您已经在函数getCompanies的 FOR 循环中返回了第一个结果。一旦你在一个函数内部返回,它就完成了,所以其他迭代不会执行。因此,您只能获得第一页的结果。

您可以在 FOR-Loop 外部创建一个缓冲区数组并从 FOR-Loop 内部添加结果,然后返回 FOR-Loop 下方的所有结果。

例如(不测试您的代码):

const getCompanies = async () => {
    const results = [];

    for (n = 0; n < baseUrl.length; n++) {
        const html = await rp(baseUrl[n]);
        const businessMap = cheerio('h3 > a', html).map(async (i, e) => {
            const link = "https://clutch.co" + e.attribs.href;
            const innerHtml = await rp(link);
            let title = cheerio('.h2_title', innerHtml).text().replace(/\s\s+/g, ' ');
            let rating = cheerio('#summary_section > div > div.col-md-6.summary-description > div:nth-child(2) > div.col-md-9 > div > span', innerHtml).text();
            let name = e.children[0].data;
            //console.log(name)

            return {
                title,
                rating,
                name
            }
        }).get();

        const result = await Promise.all(businessMap);
        
        results.concat(result);
    }

    return results;
};

推荐阅读