首页 > 解决方案 > 如何使用 Nodejs 加载和解析 6M+ 行的 300MB CSV 文件?

问题描述

我下载了一个大型 CSV 文件,其中包含我想要构建的 API 的食物营养素值,该文件是 +300MB 并且有超过 600 万行。CSV文件的结构如下:

"id","fdc_id","nutrient_id","amount","data_points","derivation_id","min","max","median","footnote","min_year_acquired"
"3639112","344604","1089","0","","75","","","","",""
"3639113","344604","1110","0","","75","","","","",""
... 6M more of these

这是我试过的,不成功。如果我将数据长度限制在一个合理的数字并提前中断,这是可行的,但是我当然需要将整个文件解析为 JSON。我也尝试使用csv-parserNPM 包和管道读取流,但也没有成功。我该怎么办?

const fs = require('fs');
const readline = require('readline');

(async () => {
    const readStream = fs.createReadStream('food_nutrient.csv');
    const rl = readline.createInterface({
        input: readStream,
        crlfDelay: Infinity
    });

    let data = [], titles;

    for await (const line of rl) {
        const row = line.split(',').map(s=>s.replace(/\W/g, '').trim());
        if (!titles) {
            titles = row;
            continue;
        }
        data.push(Object.fromEntries(titles.map((t, i) => [t, +row[i]||0])));
    }

    // never getting here in this lifetime
    debugger;
    console.log('Done!');

})();

标签: javascriptnode.jsjsoncsvstream

解决方案


推荐阅读