javascript - 循环设计改变了循环性能的 Javascript
问题描述
我写了三个 for 循环案例,它们基本上什么都不做 1000000000 次。但是这些代码的速度彼此不同。
起初,我以为是因为“let”声明计数不同,但是当我将声明移到外部 for 循环时,结果仍然相同。
甚至严重的是,最后一个 C 案例比其他案例慢得多。这时候我放弃了,来到了这里,StackOverflow。
什么会影响 JS 循环中的性能?
我在 Chrome 94.0.4606.71(Official)(x86_64) 和 NodeJS v16.3.0 上对其进行了测试
// Case A with 'let' (2nd faster)
(function() {
// let i, j, k; // <--- this barely have impact.
console.time();
for (let i = 0; i < 100; i++) {
for (let j = 0; j < 1000; j++) {
for (let k = 0; k < 10000; k++) {
// pass
}
}
}
console.timeEnd();
})();
// Case B with 'let' (1st faster)
(function() {
let i, j, k;
console.time();
for (i = 0; i < 10000; i++) {
for (j = 0; j < 1000; j++) {
for (k = 0; k < 100; k++) {
// pass
}
}
}
console.timeEnd();
})();
// Case C - no nested loop (3rd faster)
(function() {
console.time();
for (var i = 0; i < 10000 * 1000 * 100; i++) {
// pass
}
console.timeEnd();
})();
解决方案
基本答案是:因为您编写的测试并未测试优化的代码。如果 JavaScript 引擎只运行一次(甚至只是几次),它们通常不会优化函数。事实上,V8(Chrome 和 Node.js 中的引擎)运行的函数仅在启动时在解释器中运行一次,它甚至不会 JIT 编译它们。(事实证明,这样做的内存消耗更少。重复使用的函数通过 JIT 进行。更多here和here。)
如果您真的想对事物进行基准测试,请查看基准测试库或套件(例如,https://benchmarkjs.com/)。但即使我们不打算使用一个,我们也希望多次运行每个变体(绝对不少于 3 次)并取平均值。如果我们这样做,在 Chrome 上,我们看到除了 B 之外,您的三个(和我的第四个)的时间基本相同(下面讨论):
// Case A with 'let'
function A() {
// let i, j, k; // <--- this barely have impact.
for (let i = 0; i < 100; i++) {
for (let j = 0; j < 1000; j++) {
for (let k = 0; k < 10000; k++) {
// pass
}
}
}
}
// Case B with 'let' outside loop
function B() {
let i, j, k;
for (i = 0; i < 10000; i++) {
for (j = 0; j < 1000; j++) {
for (k = 0; k < 100; k++) {
// pass
}
}
}
}
// Case C - no nested loop
function C() {
for (var i = 0; i < 1000000000; i++) {
// pass
}
}
// Case D - exactly like C, but with `let` at function scope
function D() {
let i;
for (i = 0; i < 1000000000; i++) {
// pass
}
}
const tracking = [
{
count: 0,
total: 0,
fn: A,
},
{
count: 0,
total: 0,
fn: B,
},
{
count: 0,
total: 0,
fn: C,
},
{
count: 0,
total: 0,
fn: D,
},
];
const testButton = document.getElementById("test");
const loopCountInput = document.getElementById("loop-count");
const testingMessage = document.getElementById("testing");
const doneMessage = document.getElementById("done");
testButton.addEventListener("click", () => {
const loopCount = loopCountInput.valueAsNumber;
if (isNaN(loopCount)) {
return;
}
console.log(`Running test with ${loopCount} loop(s)`);
if (loopCount < loopCountInput.min) {
console.log(document.querySelector(".warning").textContent);
}
testButton.parentElement.remove();
loopCountInput.parentElement.remove();
testingMessage.classList.remove("hidden");
setTimeout(() => {
for (let loops = 0; loops < loopCount; ++loops) {
for (const entry of tracking) {
const start = performance.now();
entry.fn();
entry.total = elapsed = performance.now() - start;
++entry.count;
}
}
for (const entry of tracking) {
console.log(entry.fn.name, (entry.total / entry.count).toFixed(2) + "ms");
}
testingMessage.remove();
doneMessage.classList.remove("hidden");
}, 50);
});
.hidden {
display: none;
}
.warning {
margin-top: 4px;
margin-bottom: 4px;
display: none;
color: #d00;
}
input:invalid + .warning {
display: block;
}
<label>
Loop count:
<input type="number" id="loop-count" min="3" max="240" value="40" autofocus>
<div class="warning"><em>Strongly recommend not trusting results from fewer than 3 loops</em></div>
</label>
<div>
<input type="button" id="test" value="Test">
</div>
<div id="testing" class="hidden"><em>The test takes a while, please be patient.</em></div>
<div id="done" class="hidden">To run another test, click <strong>Run code snippet</strong> again (to refresh so the environment is "cold" again).</div>
尽管如此,即使这样也是对这些循环进行基准测试的一种相当幼稚的方法。(请注意,如果片段与和performance.now
一起运行,则 ' 的返回值不会像它们可能的那样细粒度。)Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp
关于未优化的结果:您询问了 A 和 B 以及为什么未优化的结果如此不同。对于 Chrome 上的我来说,它们并没有那么不同(尽管它们明显不同并且是一致的)。Loop count: 3 的测试给了我这样的结果(我不会查看少于那么多循环的结果,存在延迟编译和线程调度可变性等):
一个 75.60 毫秒 B 96.17 毫秒 C 75.57ms D 75.30ms
对于 ~76ms 与 ~96ms 的一个合理解释是:for
循环为变量创建了一个新的词法环境,链接到它外部的变量。当 JavaScript 引擎需要使用一个变量时(比如k
在最内层循环中),它会在当前(最内层)词法环境中查找该变量;如果在那里没有找到,它会在父词法环境中寻找它(for the j
loop);然后下一个(i
循环);然后是下一个(函数调用的顶级词法声明);等等。var
考虑到这一点,让我们看看最内层循环的第一次循环迭代期间 A 中的词法环境(以及函数调用的 -scoped 声明的变量环境)是什么样的(稍微简化了):
+−−−−−−−−−−−−−−−−−−−−−−−+ | 变量环境 | +−−−−−−−−−−−−−−−−−−−−−−−+ | [[OuterEnv]] |>−−−(全局环境) | 参数 |>−−−(空伪数组) | A |>−−−(函数) +−−−−−−−−−−−−−−−−−−−−−−−+ ^ | +−−−−−−−−−−−−−−−−−−−−−+ +−−−−−−−−−−−−−−−−−−−−−−−−+ | | 词汇环境 1 | | +−−−−−−−−−−−−−−−−−−−−−−−−+ | | [[OuterEnv]] |>−−+ +−−−−−−−−−−−−−−−−−−−−−−−−−+ ^ | +−−−−−−−−−−−−−−−−−−−−−+ +−−−−−−−−−−−−−−−−−−−−−−−−+ | | 词法环境 2 | | +−−−−−−−−−−−−−−−−−−−−−−−−+ | | [[OuterEnv]] |>−−+ | 我:0 | +−−−−−−−−−−−−−−−−−−−−−−−−−+ ^ | +−−−−−−−−−−−−−−−−−−−−−+ +−−−−−−−−−−−−−−−−−−−−−−−−+ | | 词汇环境 3 | | +−−−−−−−−−−−−−−−−−−−−−−−−+ | | [[OuterEnv]] |>−−+ | j: 0 | +−−−−−−−−−−−−−−−−−−−−−−−−−+ ^ | +−−−−−−−−−−−−−−−−−−−−−+ +−−−−−−−−−−−−−−−−−−−−−−−−+ | | 词汇环境 4 | | +−−−−−−−−−−−−−−−−−−−−−−−−+ | | [[OuterEnv]] |>−−+ | k: 0 | +−−−−−−−−−−−−−−−−−−−−−−−−−+
因此,在最内层循环中查找时k
,JavaScript 引擎不必看得很远:它就在当前的词法环境中(LexicalEnvironment 4)。无需穿越到那些外部环境。
现在让我们看一下 B 的相同情况:
+−−−−−−−−−−−−−−−−−−−−−−−+ | 变量环境 | +−−−−−−−−−−−−−−−−−−−−−−−+ | [[OuterEnv]] |>−−−(全局环境) | 参数 |>−−−(空伪数组) | B |>−−−(函数) +−−−−−−−−−−−−−−−−−−−−−−−+ ^ | +−−−−−−−−−−−−−−−−−−−−−+ +−−−−−−−−−−−−−−−−−−−−−−−−+ | | 词汇环境 1 | | +−−−−−−−−−−−−−−−−−−−−−−−−+ | | [[OuterEnv]] |>−−+ | 我:0 | | j: 0 | | k: 0 | +−−−−−−−−−−−−−−−−−−−−−−−−−+ ^ | +−−−−−−−−−−−−−−−−−−−−−+ +−−−−−−−−−−−−−−−−−−−−−−−−+ | | 词法环境 2 | | +−−−−−−−−−−−−−−−−−−−−−−−−+ | | [[OuterEnv]] |>−−+ +−−−−−−−−−−−−−−−−−−−−−−−−−+ ^ | +−−−−−−−−−−−−−−−−−−−−−+ +−−−−−−−−−−−−−−−−−−−−−−−−+ | | 词汇环境 3 | | +−−−−−−−−−−−−−−−−−−−−−−−−+ | | [[OuterEnv]] |>−−+ +−−−−−−−−−−−−−−−−−−−−−−−−−+ ^ | +−−−−−−−−−−−−−−−−−−−−−+ +−−−−−−−−−−−−−−−−−−−−−−−−+ | | 词汇环境 4 | | +−−−−−−−−−−−−−−−−−−−−−−−−+ | | [[OuterEnv]] |>−−+ +−−−−−−−−−−−−−−−−−−−−−−−−−+
在最里面的循环中,当它需要使用 时k
,它必须从当前的词法环境(LexicalEnvironment 4)遍历到它的父级的父级的父级(LexicalEnvironment 1)才能找到k
。在未优化的运行时,这不会是免费的。
这是为什么 B 比 A 稍慢的一个合理解释,但这只是一个推论,一个猜测;您必须研究 V8 Ignition 字节码才能确定。
但我不会花太多时间分析或担心未优化代码的性能。一般来说,它会足够快,并且重复使用的代码将被进一步优化以更快。例如,这是我在 Chrome 中运行上面的示例时看到的内容:
功能 | 3个循环 | 30 个循环 | 60 圈 | 120 个循环 |
---|---|---|---|---|
一个 | 75.60 毫秒 | 7.64 毫秒 | 3.83ms | 1.90ms |
乙 | 96.17 毫秒 | 9.24 毫秒 | 5.66ms | 2.67ms |
C | 75.57 毫秒 | 7.62ms | 3.83ms | 1.89ms |
D | 75.30 毫秒 | 7.62ms | 3.82ms | 1.89ms |
正如您所看到的,优化开始并在很大程度上消除了 B 之外的差异(即便如此,相对而言,差异还是很大,但绝对不是,而且它还是一个幼稚的基准)。
推荐阅读
- java - 在地图中使用通用枚举类
- django - 如何在 django 中实现两个身份验证层
- acumatica - 如何在 SO 的 POCreate 期间捕获分配给 SOLineSplit3 的 PO 引用?
- design-patterns - 工厂模式与实体原则
- c - 将查询值的输出分配给字符串
- excel - 两 (2) 个范围内差异的绝对值
- python - 为 Django 创建一个保持顺序的多值字典
- postgresql - 将varbit转换为int的其他方法?还有大神?
- windows - 在 Windows 上安装 Ghost CMS
- python-3.x - 尝试发送数据 Python CAN