首页 > 解决方案 > 如何使用 Node JS http get 方法下载大量远程文件而不会遇到错误

问题描述

我正在尝试通过节点 js 中的 HTTP get 方法下载由内部处理系统生成的文件列表。对于单个文件或几个文件,它可以正常工作,并且在 stackoverflow 上已经有一个答案。但是,当您尝试使用异步请求下载大量文件时,就会出现问题,系统只是超时并引发错误。

所以这更像是一个可扩展性问题。最好的方法是一次一个/或几个文件下载文件并移动到下一批,但我不知道该怎么做。这是我到目前为止的代码,它适用于一些文件,但在这种情况下,我有大约 850 个文件(每个文件几 MB),但它不起作用 -

const https = require("http");
var fs = require('fs');

//list of files
var file_list = [];

file_list.push('http://www.sample.com/file1');
file_list.push('http://www.sample.com/file2');
file_list.push('http://www.sample.com/file3');
.
.
.
file_list.push('http://www.sample.com/file850');


file_list.forEach(single_file => {
        const file = fs.createWriteStream('files/'+single_file ); //saving under files folder
        https.get(single_file, response => {
          var stream = response.pipe(single_file);

          stream.on("finish", function() {
            console.log("done");
          });
        });
    });


它对一些文件运行良好,并在files文件夹中创建了很多空文件,然后抛出此错误

events.js:288                                                              
      throw er; // Unhandled 'error' event                                 
      ^                                                                    
                                                                           
Error: connect ETIMEDOUT 192.168.76.86:80                                   
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1137:16)        
Emitted 'error' event on ClientRequest instance at:                        
    at Socket.socketErrorListener (_http_client.js:426:9)                  
    at Socket.emit (events.js:311:20)                                      
    at emitErrorNT (internal/streams/destroy.js:92:8)                      
    at emitErrorAndCloseNT (internal/streams/destroy.js:60:3)              
    at processTicksAndRejections (internal/process/task_queues.js:84:21) { 
  errno: 'ETIMEDOUT',                                                      
  code: 'ETIMEDOUT',                                                       
  syscall: 'connect',                                                      
  address: '192.168.76.86',                                                 
  port: 80                                                                 
}   

似乎它给网络带来了巨大的负担,可能一个一个下载这些也可能有效。如果可能,请建议最好的可扩展解决方案。谢谢。

标签: javascriptnode.jsarrays

解决方案


我个人会做这样的事情:

// currentIndex is the index of the next file to fetch
const currentIndex = 0;
// numWorkers is the maximum number of simultaneous downloads
const numWorkers = 10;
// promises holds each of our workers promises
const promises = [];

// getNextFile will download the next file, and after finishing, will
// then download the next file in the list, until all files have been 
// downloaded
const getNextFile = (resolve) => {
    if (currentIndex >= file_list.length) resolve();
    const currentFile = file_list[currentIndex];
    // increment index so any other worker will not get the same file.
    currentIndex++;
    const file = fs.createWriteStream('files/' + currentFile ); 
    https.get(single_file, response => {
        var stream = response.pipe(single_file);
        stream.on("finish", function() {
            if (currentIndex === file_list.length) {
                resolve();
            } else {
                getNextFile(resolve);
            }
        });
    });
}
for (let i = 0; i < numWorkers; i++) {
    promises.push(new Promise((resolve, reject) => {
        getNextFile(resolve);
    }));         
}

Promise.all(promises).then(() => console.log('All files complete'));

推荐阅读