javascript - 如何使用 Nodejs 加载和解析 6M+ 行的 300MB CSV 文件?
问题描述
我下载了一个大型 CSV 文件,其中包含我想要构建的 API 的食物营养素值,该文件是 +300MB 并且有超过 600 万行。CSV文件的结构如下:
"id","fdc_id","nutrient_id","amount","data_points","derivation_id","min","max","median","footnote","min_year_acquired"
"3639112","344604","1089","0","","75","","","","",""
"3639113","344604","1110","0","","75","","","","",""
... 6M more of these
这是我试过的,不成功。如果我将数据长度限制在一个合理的数字并提前中断,这是可行的,但是我当然需要将整个文件解析为 JSON。我也尝试使用csv-parser
NPM 包和管道读取流,但也没有成功。我该怎么办?
const fs = require('fs');
const readline = require('readline');
(async () => {
const readStream = fs.createReadStream('food_nutrient.csv');
const rl = readline.createInterface({
input: readStream,
crlfDelay: Infinity
});
let data = [], titles;
for await (const line of rl) {
const row = line.split(',').map(s=>s.replace(/\W/g, '').trim());
if (!titles) {
titles = row;
continue;
}
data.push(Object.fromEntries(titles.map((t, i) => [t, +row[i]||0])));
}
// never getting here in this lifetime
debugger;
console.log('Done!');
})();
解决方案
推荐阅读
- java - 扫描程序类 NoSuchElement 错误
- ember.js - 余烬纸和课堂?
- c# - 使用 C#/.Net Core 2.2 的 ASP.Net Core 中的全局键挂钩
- amazon-web-services - 通过 Spark 对本地文件系统中是否存在文件进行单元测试
- amazon-web-services - 在 4XX 错误处理程序中对 Amazon CloudFront 执行内部重定向
- dns - Homestead Vagrant Virtualbox 无法解析来宾操作系统内的 DNS
- java - 如何使用数组列表解析 json 响应
- typescript - 我可以根据字符串中的字符创建类型吗
- r - 为一系列文件添加时差
- php - 如何在 Codeigniter 的数据表中发送 ajax 请求