首页 > 解决方案 > Windows 上的 Node.js 和堆内存不足

问题描述

我的一个项目有一些问题,该项目旨在扫描一个或多个目录以搜索 MP3 文件并将其元数据和路径存储到 MongoDB 中。运行代码的主计算机是 Windows 10 64 位计算机,具有 8GB RAM,CPU AMD Ryzen 3.5 GHz(4 核)。Windows 驻留在 SSD 上,而音乐在 HDD 1 TB 上。
nodejs 应用程序可以通过命令行或 NPM 手动启动,从这里开始。我正在使用递归函数来扫描所有目录,我们或多或少地谈论了 2 万个文件。
我已经通过graceful-fsEMFILE: too many files open解决了这个问题,但现在我遇到了一个新问题:. 以下是我收到的完整输出:JavaScript heap out of memory

C:\Users\User\Documents\GitHub\mp3manager>npm run scan

> experiments@1.0.0 scan C:\Users\User\Documents\GitHub\mp3manager
> cross-env NODE_ENV=production NODE_OPTIONS='--max-old-space-size=4096' node scripts/cli/mm scan D:\Musica

Scanning 1 resources in production mode
Trying to connect to  mongodb://localhost:27017/music_manager
Connected to mongo...

<--- Last few GCs --->

[16744:0000024DD9FA9F40]   141399 ms: Mark-sweep 63.2 (70.7) -> 63.2 (71.2) MB, 47.8 / 0.1 ms  (average mu = 0.165, current mu = 0.225) low memory notification GC in old space requested
[16744:0000024DD9FA9F40]   141438 ms: Mark-sweep 63.2 (71.2) -> 63.2 (71.2) MB, 38.9 / 0.1 ms  (average mu = 0.100, current mu = 0.001) low memory notification GC in old space requested


<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 0x02aaa229e6e9 <JSObject>
    0: builtin exit frame: new ArrayBuffer(aka ArrayBuffer)(this=0x027bb3502801 <the_hole>,0x0202be202569 <Number 8.19095e+06>,0x027bb3502801 <the_hole>)

    1: ConstructFrame [pc: 000002AF8F50D385]
    2: createUnsafeArrayBuffer(aka createUnsafeArrayBuffer) [00000080419526C9] [buffer.js:~115] [pc=000002AF8F8440B1](this=0x027bb35026f1 <undefined>,size=0x0202be202569 <Number 8.19095e+06>)
    3:...

FATAL ERROR: Committing semi space failed. Allocation failed - JavaScript heap out of memory
 1: 00007FF6E36FF04A
 2: 00007FF6E36DA0C6
 3: 00007FF6E36DAA30
 4: 00007FF6E39620EE
 5: 00007FF6E396201F
 6: 00007FF6E3E82BC4
 7: 00007FF6E3E79C5C
 8: 00007FF6E3E7829C
 9: 00007FF6E3E77765
10: 00007FF6E3989A91
11: 00007FF6E35F0E52
12: 00007FF6E3C7500F
13: 00007FF6E3BE55B4
14: 00007FF6E3BE5A5B
15: 00007FF6E3BE587B
16: 000002AF8F55C721
npm ERR! code ELIFECYCLE
npm ERR! errno 134

我试过使用NODE_OPTIONS='--max-old-space-size=4096',但我什至不确定 Node 是否在 Windows 上考虑这个选项。我已经尝试过p-limit来限制有效运行的 Promise 的数量,但老实说,我现在有点没有新想法了,我开始考虑使用另一种语言来看看它是否能更好地应对这些问题。任何意见,将不胜感激。祝你今天过得愉快。

编辑:我试图processDir用@Terry 发布的函数替换该函数,但结果是一样的。

2019-08-19 更新:为了避免堆问题,我删除了递归并使用队列来添加目录:


const path = require('path');
const mm = require('music-metadata');
const _ = require('underscore');
const fs = require('graceful-fs');
const readline = require('readline');

const audioType = require('audio-type');
// const util = require('util');
const { promisify } = require('util');
const logger = require('../logger');
const { mp3hash } = require('../../../src/libs/utils');
const MusicFile = require('../../../src/models/db/mongo/music_files');

const getStats = promisify(fs.stat);
const readdir = promisify(fs.readdir);
const readFile = promisify(fs.readFile);
// https://github.com/winstonjs/winston#profiling

class MusicScanner {
    constructor(options) {
        const { paths, keepInMemory } = options;

        this.paths = paths;
        this.keepInMemory = keepInMemory === true;
        this.processResult = {
            totFiles: 0,
            totBytes: 0,
            dirQueue: [],
        };
    }

    async processFile(resource) {
        const buf = await readFile(resource);
        const fileRes = audioType(buf);          
        if (fileRes === 'mp3') {
            this.processResult.totFiles += 1;

            // process the metadata
            this.processResult.totBytes += fileSize;
        }
    }

    async processDirectory() {
        while(this.processResult.dirQueue.length > 0) {
            const dir = this.processResult.dirQueue.shift();
            const dirents = await readdir(dir, { withFileTypes: true });
            const filesPromises = [];

            for (const dirent of dirents) {
                const resource = path.resolve(dir, dirent.name);
                if (dirent.isDirectory()) {
                    this.processResult.dirQueue.push(resource);
                } else if (dirent.isFile()) {
                    filesPromises.push(this.processFile(resource));
                }
            }

            await Promise.all(filesPromises);
        }
    }


    async scan() {
        const promises = [];

        const start = Date.now();

        for (const thePath of this.paths) {
            this.processResult.dirQueue.push(thePath);
            promises.push(this.processDirectory());
        }

        const paths = await Promise.all(promises);
        this.processResult.paths = paths;
        return this.processResult;
    }
}

module.exports = MusicScanner;

这里的问题是该过程需要54 分钟才能读取 21K 文件,我不确定在这种情况下如何加快该过程。有什么提示吗?

标签: node.jsrecursionout-of-memory

解决方案


我不确定这会有多大帮助,但我创建了一个测试脚本来看看我是否得到了和你一样的结果,我也在运行 Windows 10。

运行此脚本并查看是否有任何问题可能对您有用。我能够列出 /program files/ (~91k 个文件)甚至 /windows (~265k 个文件)中的所有文件而不会炸毁。也许这是另一个操作,而不是简单地列出导致问题的文件。

该脚本将返回路径中所有文件的列表,这正是您所需要的。一旦你有了它,它可以简单地以线性方式迭代,然后你可以将细节添加到你的 Mongo 数据库实例中。

const fs = require('fs');
const path = require('path');
const { promisify } = require('util');
const getStats = promisify(fs.stat);
const readdir = promisify(fs.readdir);

async function scanDir(dir, fileList) {

    let files = await readdir(dir);
    for(let file of files) {
        let filePath = path.join(dir, file);
        fileList.push(filePath);
        try {
            let stats = await getStats(filePath);
            if (stats.isDirectory()) {
                await scanDir(filePath, fileList);
            }
        } catch (err) {
            // Drop on the floor.. 
        }
    }

    return fileList;   
}

function logStats(fileList) {
    console.log("Scanned file count: ", fileList.length);
    console.log(`Heap total: ${parseInt(process.memoryUsage().heapTotal/1024)} KB, used: ${parseInt(process.memoryUsage().heapUsed/1024)} KB`);
}

async function testScan() {
    let fileList = [];
    let handle = setInterval(logStats, 5000, fileList);
    let startTime = new Date().getTime();
    await scanDir('/program files/', fileList);
    clearInterval(handle);
    console.log(`File count: ${fileList.length}, elapsed: ${(new Date().getTime() - startTime)/1000} seconds`);
}

testScan();

推荐阅读