node.js - 如何在Nodejs中批量读取大csv文件?
问题描述
我有一个包含超过 50 万条记录的 csv 文件。csv的字段是
- 姓名
- 年龄
- 分支
如果不将大量数据加载到内存中,我需要处理文件中的所有记录。需要读取少量记录,将它们插入到收集和操作中,然后继续读取剩余的记录。因为我是新手,所以无法理解它是如何工作的。如果我尝试打印批次,它会打印缓冲数据,下面的代码可以满足我的要求吗?使用该缓冲值,我如何获取 csv 记录和插入、操作文件数据。
var stream = fs.createReadStream(csvFilePath)
.pipe(csv())
.on('data',(data) => {
batch.push(data)
counter ++;
if(counter == 100){
stream.pause()
setTimeout(() => {
console.log("batch in ",data)
counter = 0;
batch = []
stream.resume()},5000)
}
})
.on('error',(e) => {
console.log("er ",e);
})
.on('end',() => {
console.log("end");
})
解决方案
我已经为您编写了一些如何使用流的示例代码。您基本上创建了一个流并继续处理它的块。块是类型的对象buffer
。将其作为文本调用处理toString()
。
没有太多时间向您解释更多,但评论应该会有所帮助。
还要考虑使用一个模块,因为 csv 解析已经做了很多。希望这有帮助>
import * as fs from 'fs'
// end oof line delimiter, system specific.
import { EOL } from 'os'
// the delimiter used in csv
var delimiter = ','
// add your own implementttaion of parsing a portion of the text here.
const parseChunk = (text, index) => {
// first chunk, the header is included here.
if(index === 0) {
// The first row will be the header. So take it
var headerLine = text.substring(0, text.indexOf(EOL))
// remove the header from the text for further processing.
// also replace the new line character..
text = text.replace(headerLine+EOL, '')
// Do something with header here..
}
// Now you have a part of the file to process without headers.
// The csv parse function you need to figure out yourself. Best
// is to use some module for that. there are plenty of edge cases
// when parsing csv.
// custom csv parser here =>h ttps://stackoverflow.com/questions/1293147/example-javascript-code-to-parse-csv-data
// if the csv is well formatted it could be enough to use this
var lines = text.split(EOL)
for(var line of lines) {
var values = line.split(delimiter)
console.log('liine received', values)
// StoreToDb(values)
}
}
// create the stream
const stream = fs.createReadStream('file.csv')
// variable to count the chunks for knowing if header is inckuded..
var chunkCount = 0
// handle data event of stream
stream.on('data', chunk => {
// the stream sends you a Buffer
// to have it as text, convert it to string
const text = chunk.toString()
// Note that chunks will be a fixed size
// but mostly consist of multiple lines,
parseChunk(text, chunkCount)
// increment the count.
chunkCount++
})
stream.on('end', () => {
console.log('parsing finished')
})
stream.on('error', (err) => {
// error, handle properly here, maybe rollback changess already made to db
// and parse again. You can may also use the chunkCount to start the parsing
// again and omit first x chunks, so u can restsart at given point
console.log('parsing error ', err)
})
推荐阅读
- angular - 如何根据条件获取正确的 ngfor 索引/行号值
- mysql - Mysql - 将变量与存储的通配符进行比较
- flutter - flutter_inappwebview 5.3.2 包在选择文件或图像时使应用程序崩溃
- mysql - 我可以合并一个运行 insert into 的 .sql 文件和一个在 2 列上创建虚拟数据的 Laravel 工厂播种器吗?
- command-line-interface - 命令行界面 - 开发
- python - 编写 JSON 文件和 Python
- sql - 我加入了两张桌子。我正在尝试按城市分组,但它没有按
- python - 在可滚动框架上使用鼠标滚轮
- python - 是否可以访问用 jinja2 循环的表单输入?
- mongodb - 如何读取元数据库中的 json 字段?