javascript - 如何使用 Node JS http get 方法下载大量远程文件而不会遇到错误
问题描述
我正在尝试通过节点 js 中的 HTTP get 方法下载由内部处理系统生成的文件列表。对于单个文件或几个文件,它可以正常工作,并且在 stackoverflow 上已经有一个答案。但是,当您尝试使用异步请求下载大量文件时,就会出现问题,系统只是超时并引发错误。
所以这更像是一个可扩展性问题。最好的方法是一次一个/或几个文件下载文件并移动到下一批,但我不知道该怎么做。这是我到目前为止的代码,它适用于一些文件,但在这种情况下,我有大约 850 个文件(每个文件几 MB),但它不起作用 -
const https = require("http");
var fs = require('fs');
//list of files
var file_list = [];
file_list.push('http://www.sample.com/file1');
file_list.push('http://www.sample.com/file2');
file_list.push('http://www.sample.com/file3');
.
.
.
file_list.push('http://www.sample.com/file850');
file_list.forEach(single_file => {
const file = fs.createWriteStream('files/'+single_file ); //saving under files folder
https.get(single_file, response => {
var stream = response.pipe(single_file);
stream.on("finish", function() {
console.log("done");
});
});
});
它对一些文件运行良好,并在files
文件夹中创建了很多空文件,然后抛出此错误
events.js:288
throw er; // Unhandled 'error' event
^
Error: connect ETIMEDOUT 192.168.76.86:80
at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1137:16)
Emitted 'error' event on ClientRequest instance at:
at Socket.socketErrorListener (_http_client.js:426:9)
at Socket.emit (events.js:311:20)
at emitErrorNT (internal/streams/destroy.js:92:8)
at emitErrorAndCloseNT (internal/streams/destroy.js:60:3)
at processTicksAndRejections (internal/process/task_queues.js:84:21) {
errno: 'ETIMEDOUT',
code: 'ETIMEDOUT',
syscall: 'connect',
address: '192.168.76.86',
port: 80
}
似乎它给网络带来了巨大的负担,可能一个一个下载这些也可能有效。如果可能,请建议最好的可扩展解决方案。谢谢。
解决方案
我个人会做这样的事情:
// currentIndex is the index of the next file to fetch
const currentIndex = 0;
// numWorkers is the maximum number of simultaneous downloads
const numWorkers = 10;
// promises holds each of our workers promises
const promises = [];
// getNextFile will download the next file, and after finishing, will
// then download the next file in the list, until all files have been
// downloaded
const getNextFile = (resolve) => {
if (currentIndex >= file_list.length) resolve();
const currentFile = file_list[currentIndex];
// increment index so any other worker will not get the same file.
currentIndex++;
const file = fs.createWriteStream('files/' + currentFile );
https.get(single_file, response => {
var stream = response.pipe(single_file);
stream.on("finish", function() {
if (currentIndex === file_list.length) {
resolve();
} else {
getNextFile(resolve);
}
});
});
}
for (let i = 0; i < numWorkers; i++) {
promises.push(new Promise((resolve, reject) => {
getNextFile(resolve);
}));
}
Promise.all(promises).then(() => console.log('All files complete'));
推荐阅读
- scala - Scala DataFrame,将非空列的值复制到新列中
- r - 将函数应用于 data.frame 的行
- flutter - 如何在颤动应用程序加载时显示一些加载屏幕或启动画面
- python - Matplotlib:3D 线集合绘制在任何其他绘图之上
- python - Plotly 树图不创建根节点
- django - 如何使用 django 将徽标添加到站点索引栏
- xcode - iOS 14,集成框架时的构建问题 - x86_64 模拟器
- facebook - 如何在 Flutter App 中集成 Facebook Sdk?
- python - 无法访问环境变量值?
- python - QTextEdit 角不遵循样式表中分配的边界边界