首页 > 解决方案 > 为什么 jemalloc 分配 4096 字节内存的时间比其他 SMALL 内存要多?

问题描述

在测试jemalloc-5.2.0分配small_class内存的性能过程中,发现4096字节的内存分配时间明显高于其他小类内存。jemalloc中的4096字节内存分配有什么特殊处理吗?还是有其他原因?

试验结果:

Run on (32 X 3400 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x16)
  L1 Instruction 32 KiB (x16)
  L2 Unified 256 KiB (x16)
  L3 Unified 20480 KiB (x2)
Load Average: 15.72, 14.21, 14.26
-----------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations
-----------------------------------------------------------------------------------------
BM_SomeFunction/1792/iterations:500/threads:24      0.095 ms         2.12 ms        12000
BM_SomeFunction/1856/iterations:500/threads:24      0.175 ms         4.10 ms        12000
BM_SomeFunction/1920/iterations:500/threads:24      0.178 ms         4.13 ms        12000
BM_SomeFunction/1984/iterations:500/threads:24      0.177 ms         4.14 ms        12000
BM_SomeFunction/2048/iterations:500/threads:24      0.181 ms         4.18 ms        12000
BM_SomeFunction/2048/iterations:500/threads:24      0.177 ms         4.16 ms        12000
BM_SomeFunction/2176/iterations:500/threads:24      0.116 ms         2.67 ms        12000
BM_SomeFunction/2304/iterations:500/threads:24      0.113 ms         2.64 ms        12000
BM_SomeFunction/2432/iterations:500/threads:24      0.118 ms         2.75 ms        12000
BM_SomeFunction/2560/iterations:500/threads:24      0.113 ms         2.65 ms        12000
BM_SomeFunction/2560/iterations:500/threads:24      0.114 ms         2.68 ms        12000
BM_SomeFunction/2688/iterations:500/threads:24      0.133 ms         3.13 ms        12000
BM_SomeFunction/2816/iterations:500/threads:24      0.132 ms         3.08 ms        12000
BM_SomeFunction/2944/iterations:500/threads:24      0.131 ms         3.09 ms        12000
BM_SomeFunction/3072/iterations:500/threads:24      0.132 ms         3.10 ms        12000
BM_SomeFunction/3072/iterations:500/threads:24      0.132 ms         3.11 ms        12000
BM_SomeFunction/3200/iterations:500/threads:24      0.117 ms         2.72 ms        12000
BM_SomeFunction/3328/iterations:500/threads:24      0.113 ms         2.66 ms        12000
BM_SomeFunction/3456/iterations:500/threads:24      0.111 ms         2.61 ms        12000
BM_SomeFunction/3584/iterations:500/threads:24      0.112 ms         2.63 ms        12000
BM_SomeFunction/3584/iterations:500/threads:24      0.112 ms         2.63 ms        12000
BM_SomeFunction/3712/iterations:500/threads:24      0.271 ms         6.35 ms        12000
BM_SomeFunction/3840/iterations:500/threads:24      0.270 ms         6.35 ms        12000
BM_SomeFunction/3968/iterations:500/threads:24      0.274 ms         6.42 ms        12000
BM_SomeFunction/4096/iterations:500/threads:24      0.276 ms         6.49 ms        12000
BM_SomeFunction/4096/iterations:500/threads:24      0.273 ms         6.41 ms        12000
BM_SomeFunction/4352/iterations:500/threads:24      0.151 ms         3.53 ms        12000
BM_SomeFunction/4608/iterations:500/threads:24      0.146 ms         3.45 ms        12000
BM_SomeFunction/4864/iterations:500/threads:24      0.142 ms         3.36 ms        12000
BM_SomeFunction/5120/iterations:500/threads:24      0.144 ms         3.40 ms        12000
BM_SomeFunction/5120/iterations:500/threads:24      0.146 ms         3.40 ms        12000
BM_SomeFunction/5376/iterations:500/threads:24      0.196 ms         4.57 ms        12000
BM_SomeFunction/5632/iterations:500/threads:24      0.187 ms         4.39 ms        12000
BM_SomeFunction/5888/iterations:500/threads:24      0.191 ms         4.47 ms        12000
BM_SomeFunction/6144/iterations:500/threads:24      0.188 ms         4.39 ms        12000

测试报告:

BM_SomeFunction/1792/iterations:500/threads:24      0.095 ms         2.12 ms        12000

意味着分配 1792 字节的内存消耗 2.12 毫秒的 CPU 时间。

测试代码

#include "benchmark/benchmark.h"
#include "jemalloc/jemalloc.h"

static size_t kBatchSize = 10000;

static void alloc_mem_n(size_t size) {
    std::vector<char*> kVec(kBatchSize, 0);
    for (int i = 0; i < kBatchSize; ++i) {
        auto p = new char[size];
        p[0] = i;
        benchmark::ClobberMemory();
        kVec[i] = p;
    }
    for (auto &p : kVec) {
        delete p;
        p = nullptr;
    }
}

static void BM_SomeFunction(benchmark::State& state) {
    for (auto _ : state) {
        alloc_mem_n(state.range(0));
    }
}


BENCHMARK(BM_SomeFunction)
    ->Unit(benchmark::kMillisecond)
    ->Iterations(500)
    ->Threads(24)
    ->DenseRange(1792, 2048, 64)
    ->DenseRange(2048, 2560, 128)
    ->DenseRange(2560, 3072, 128)
    ->DenseRange(3072, 3584, 128)
    ->DenseRange(3584, 4096, 128)
    ->DenseRange(4096, 5120, 256)
    ->DenseRange(5120, 6144, 256);

BENCHMARK_MAIN();

标签: c++jemalloc

解决方案


推荐阅读