首页 > 解决方案 > 负载操作是否在调度、完成或其他时间从 RS 释放?

问题描述

在现代 Intel 1 x86 上,负载 uops 在发送2时、完成3介于4之间的某处时是否从 RS(预留站)释放?


1我也对 AMD Zen 和续集感兴趣,因此也可以随意加入,但为了使问题易于管理,我将其限制为英特尔。此外,AMD 的负载管道似乎与英特尔有所不同,这可能会使对 AMD 的调查成为一项单独的任务。

2这里的 Dispatch 是指离开 RS 去执行。

3这里的完成是指当加载数据返回并准备好满足依赖的微指令时。

4甚至在这两个事件定义的时间范围之外的某个地方,这似乎不太可能但可能。

标签: x86intelcpu-architecturemicro-architecture

解决方案


刚碰到这个问题。这是我试图回答的问题。

简短的回答:我仍然对某些部分有点不确定,但基于使用各种性能计数器以及性能监控中断的一些测量,它“看起来”负载 uop 在被分派到负载端口的同一周期内从 RS 中删除或者至少在不久之后。

详细信息:前段时间我尝试编写一个模仿这里想法的内核模块。链接的博客文章很好地描述了这个想法,所以我不会在这里详细解释。主要思想是在经过一定数量的周期后触发性能监控中断,冻结所有计数器值(当前跟踪),存储它们并重置/重复。在 1、2、... 图片的准确性是另一回事......我用于测量的内核模块的源代码可以在这里找到。

长答案:我在 i7-1065G7(Ice Lake)上使用上述内核模块分析了以下代码,并跟踪了 11 个不同的性能计数器。在配置mov指令之前,clflush在存储的地址上被调用r8. 这样做是为了让加载需要足够长的时间,以便轻松判断 uop 是否在执行之前、之后或期间从 RS 中删除(否则加载大约需要 4 个周期)。我总共测量了多达 600 个周期,其中大多数与这个问题有关的事件发生在 65 个周期内。考虑到噪音,我对每个周期进行了 1024 次试验,并存储了出现次数最多的计数器值。幸运的是,对于下表中的每个周期和每个计数器,我最多只能看到一次试验的值偏差,其余 1023 次试验给出相同的计数器值。

 563:   0f 30                   wrmsr  
 565:   4d 8b 00                mov    (%r8),%r8
 568:   0f ae f0                mfence 
 56b:   0f ae e8                lfence

下面列出了跟踪的计数器。描述来自英特尔 SDM。

  INST_RETIRED_ANY_P:          To track when wrmsr retired
  RS_EVENTS_EMPTY_CYCLES:      Count of cycles RS is empty
  UOPS_DISPATCHED_PORT_PORT_0: # uops dispatched to port 0
  UOPS_DISPATCHED_PORT_PORT_1: # uops dispatched to port 1 
  UOPS_DISPATCHED_PORT_2_3:    # uops dispatched to port 2,3 (load addr ports)
  UOPS_DISPATCHED_PORT_4_9:    # uops dispatched to port 4,9 (store data ports)
  UOPS_DISPATCHED_PORT_PORT_5: # uops dispatched to port 5
  UOPS_DISPATCHED_PORT_PORT_6: # uops dispatched to port 6
  UOPS_DISPATCHED_PORT_7_8:    # uops dispatched to port 7,8 (store addr ports)
  UOPS_EXECUTED_THREAD:        # uops executed
  UOPS_ISSUED_ANY:             # uops sent to RS from RAT

下表列出了每个周期的每个计数器值。因此,根据下表,一个 uop 在周期 47 被发送到 RS,并占用 RS 周期 51-54。这大概是负载uop。在周期 54RS_EVENTS_EMPTY_CYCLESUOPS_DISPATCHED_PORT_2_3增量这意味着(至少我是如何解释它的)负载 uop 已被调度并从 RS 中释放。

我不确定的是,在第 52 周期,又向 RS 发出了三个微指令。他们似乎到达并占领了 RS 55-58 周期。但是只有两个微指令被分派到执行端口,并且 RS 被清空。不管到第 59 个周期,RS 都是空的(每个周期计数都在增加)。负载完成并mov在大约 500 个周期后退出。

+-------+--------------+-----------------+--------+--------+----------+----------+--------+--------+----------+---------------+-------------------+------------------------+
| Cycle | Inst Retired | Cycles RS Empty | Port 0 | Port 1 | Port 2,3 | Port 4,9 | Port 5 | Port 6 | Port 7,8 | uops executed | uops issued to RS |        Comments        |
+-------+--------------+-----------------+--------+--------+----------+----------+--------+--------+----------+---------------+-------------------+------------------------+
|     1 |            0 |               3 |      0 |      0 |        0 |        0 |      0 |      0 |        0 |             3 |                 0 |                        |
|     2 |            0 |               4 |      0 |      0 |        0 |        0 |      0 |      0 |        0 |             3 |                 0 |                        |
|     3 |            0 |               5 |      0 |      0 |        0 |        0 |      0 |      0 |        0 |             3 |                 0 |                        |
|     4 |            0 |               6 |      0 |      0 |        0 |        0 |      0 |      0 |        0 |             3 |                 2 | 2 uops issued          |
|     5 |            0 |               7 |      0 |      0 |        0 |        0 |      0 |      0 |        0 |             3 |                 2 |                        |
|     6 |            0 |               8 |      0 |      0 |        0 |        0 |      0 |      0 |        0 |             3 |                 2 |                        |
|     7 |            0 |               9 |      0 |      0 |        0 |        0 |      0 |      0 |        0 |             3 |                 2 |                        |
|     8 |            0 |              10 |      0 |      0 |        0 |        0 |      0 |      0 |        0 |             3 |                 2 |                        |
|     9 |            0 |              11 |      0 |      0 |        0 |        0 |      0 |      0 |        0 |             3 |                 2 |                        |
|    10 |            0 |              12 |      0 |      0 |        0 |        0 |      0 |      0 |        0 |             3 |                 2 |                        |
|    11 |            0 |              12 |      0 |      0 |        0 |        0 |      0 |      0 |        0 |             3 |                 2 |                        |
|    12 |            0 |              12 |      0 |      0 |        0 |        0 |      0 |      0 |        0 |             3 |                 2 |                        |
|    13 |            0 |              12 |      0 |      0 |        0 |        0 |      0 |      0 |        0 |             3 |                 2 |                        |
|    14 |            0 |              13 |      0 |      0 |        0 |        0 |      0 |      1 |        0 |             3 |                 2 |                        |
|    15 |            0 |              14 |      0 |      0 |        0 |        0 |      0 |      2 |        0 |             3 |                 2 | 2 uops dispatched      |
|    16 |            0 |              15 |      0 |      0 |        0 |        0 |      0 |      2 |        0 |             4 |                 2 |                        |
|    17 |            0 |              16 |      0 |      0 |        0 |        0 |      0 |      2 |        0 |             5 |                 2 | 2 uops executedd       |
|    18 |            0 |              17 |      0 |      0 |        0 |        0 |      0 |      2 |        0 |             5 |                 2 |                        |
|    19 |            0 |              18 |      0 |      0 |        0 |        0 |      0 |      2 |        0 |             5 |                 2 |                        |
|    20 |            0 |              19 |      0 |      0 |        0 |        0 |      0 |      2 |        0 |             5 |                 2 |                        |
|    21 |            0 |              20 |      0 |      0 |        0 |        0 |      0 |      2 |        0 |             5 |                 2 |                        |
|    22 |            0 |              21 |      0 |      0 |        0 |        0 |      0 |      2 |        0 |             5 |                 2 |                        |
|    23 |            0 |              22 |      0 |      0 |        0 |        0 |      0 |      2 |        0 |             5 |                 5 |                        |
|    24 |            0 |              23 |      0 |      0 |        0 |        0 |      0 |      2 |        0 |             5 |                 6 | 4 uops issued          |
|    25 |            0 |              24 |      0 |      0 |        0 |        0 |      0 |      2 |        0 |             5 |                 6 |                        |
|    26 |            0 |              25 |      0 |      0 |        0 |        0 |      0 |      2 |        0 |             5 |                 6 |                        |
|    27 |            0 |              25 |      0 |      0 |        0 |        0 |      0 |      2 |        0 |             5 |                 6 |                        |
|    28 |            0 |              25 |      0 |      0 |        0 |        0 |      0 |      2 |        0 |             5 |                 6 |                        |
|    29 |            0 |              25 |      0 |      0 |        0 |        0 |      0 |      2 |        0 |             5 |                 6 |                        |
|    30 |            0 |              25 |      0 |      1 |        0 |        0 |      0 |      2 |        0 |             5 |                 6 |                        |
|    31 |            0 |              26 |      0 |      1 |        0 |        0 |      0 |      3 |        0 |             5 |                 6 |                        |
|    32 |            0 |              27 |      0 |      1 |        0 |        0 |      0 |      4 |        0 |             6 |                 6 |                        |
|    33 |            0 |              28 |      0 |      1 |        0 |        0 |      0 |      4 |        0 |             7 |                 6 |                        |
|    34 |            0 |              29 |      0 |      1 |        0 |        0 |      0 |      4 |        0 |             8 |                 6 | 3 uops executed        |
|    35 |            0 |              30 |      0 |      1 |        0 |        0 |      0 |      4 |        0 |             8 |                 6 |                        |
|    36 |            1 |              31 |      0 |      1 |        0 |        0 |      0 |      4 |        0 |             8 |                 6 | wrmsr retired          |
|    37 |            1 |              32 |      0 |      1 |        0 |        0 |      0 |      4 |        0 |             8 |                 6 |                        |
|    38 |            1 |              33 |      0 |      1 |        0 |        0 |      0 |      4 |        0 |             8 |                 6 |                        |
|    39 |            1 |              34 |      0 |      1 |        0 |        0 |      0 |      4 |        0 |             8 |                 6 |                        |
|    40 |            1 |              35 |      0 |      1 |        0 |        0 |      0 |      4 |        0 |             8 |                 6 |                        |
|    41 |            1 |              36 |      0 |      1 |        0 |        0 |      0 |      4 |        0 |             8 |                 6 |                        |
|    42 |            1 |              37 |      0 |      1 |        0 |        0 |      0 |      4 |        0 |             8 |                 6 |                        |
|    43 |            1 |              38 |      0 |      1 |        0 |        0 |      0 |      4 |        0 |             8 |                 6 |                        |
|    44 |            1 |              39 |      0 |      1 |        0 |        0 |      0 |      4 |        0 |             8 |                 6 |                        |
|    45 |            1 |              40 |      0 |      1 |        0 |        0 |      0 |      4 |        0 |             8 |                 6 |                        |
|    46 |            1 |              41 |      0 |      1 |        0 |        0 |      0 |      4 |        0 |             8 |                 6 |                        |
|    47 |            1 |              42 |      0 |      1 |        0 |        0 |      0 |      4 |        0 |             8 |                 6 |                        |
|    48 |            1 |              43 |      0 |      1 |        0 |        0 |      0 |      4 |        0 |             8 |                 7 | 1 uop issued           |
|    49 |            1 |              44 |      0 |      1 |        0 |        0 |      0 |      4 |        0 |             8 |                 7 |                        |
|    50 |            1 |              45 |      0 |      1 |        0 |        0 |      0 |      4 |        0 |             8 |                 7 |                        |
|    51 |            1 |              46 |      0 |      1 |        0 |        0 |      0 |      4 |        0 |             8 |                 7 |                        |
|    52 |            1 |              46 |      0 |      1 |        0 |        0 |      0 |      4 |        0 |             8 |                10 | 3 uops issued          |
|    53 |            1 |              46 |      0 |      1 |        0 |        0 |      0 |      4 |        0 |             8 |                10 |                        |
|    54 |            1 |              46 |      0 |      1 |        0 |        0 |      0 |      4 |        0 |             8 |                10 | port 2,3 load addr     |
|    55 |            1 |              47 |      0 |      1 |        1 |        0 |      0 |      4 |        0 |             8 |                10 |                        |
|    56 |            1 |              47 |      0 |      1 |        1 |        0 |      0 |      4 |        0 |             8 |                10 | executing load         |
|    57 |            1 |              47 |      0 |      1 |        1 |        0 |      0 |      4 |        0 |             9 |                10 |                        |
|    58 |            1 |              47 |      0 |      1 |        1 |        0 |      0 |      4 |        0 |             9 |                10 | port 4,9 store data    |
|    59 |            1 |              48 |      0 |      1 |        1 |        1 |      0 |      4 |        1 |             9 |                10 | port 7,8 store address |
|    60 |            1 |              49 |      0 |      1 |        1 |        1 |      0 |      4 |        1 |             9 |                10 |                        |
|    61 |            1 |              50 |      0 |      1 |        1 |        1 |      0 |      4 |        1 |            11 |                10 | 2 uops executed        |
|    62 |            1 |              51 |      0 |      1 |        1 |        1 |      0 |      4 |        1 |            11 |                10 |                        |
|    63 |            1 |              52 |      0 |      1 |        1 |        1 |      0 |      4 |        1 |            11 |                10 |                        |
|    64 |            1 |              53 |      0 |      1 |        1 |        1 |      0 |      4 |        1 |            11 |                10 |                        |
|    65 |            1 |              54 |      0 |      1 |        1 |        1 |      0 |      4 |        1 |            11 |                10 |                        |
+-------+--------------+-----------------+--------+--------+----------+----------+--------+--------+----------+---------------+-------------------+------------------------+

因此,根据该表,加载 uop 似乎是在调度到加载端口的同时或几个周期后从 RS 中删除的。我对图表中的值进行了一些完整性检查,并且在大多数情况下,所有计数器值都是有意义的。我没有弄清楚的两件事是 4 个微指令将被发送到 RS(第 24 周期),但只有 3 个微指令被执行(第 35 周期)。类似地,在周期 52 发出 3 条微指令,但只执行了 2 条(周期 61)

谢谢


推荐阅读