首页 > 解决方案 > 为什么索引器属性设置器没有内联?

问题描述

我有一个将非托管分配包装到数组中的类。您可以在 Github 上查看源代码,但这里是它的主要要点:

public unsafe class ArrayReference<T> : Reference, IArrayReference<T>
  where T : unmanaged
{

  private T* typedPointer_;

  public T this[ int index ]
  {
    [MethodImpl( MethodImplOptions.AggressiveInlining )]
    get => typedPointer_[ index ];
    [MethodImpl( MethodImplOptions.AggressiveInlining )]
    set => typedPointer_[ index ] = value;
  }

}

它非常简单,对于读取操作,它提供了出色的性能(使用 BenchmarkDotNet 测量):

Array size: 128
Managed byte[] ranged-for get: 69.8046 ns
ArrayReference ranged-for get: 66.7340 ns
Managed byte[] ranged-for set: 66.1855 ns
ArrayReference ranged-for set: 68.4863 ns

和基准代码:

[GlobalSetup]
public void Setup()
{
  median = AllocationSize / 2;
  alloc_ = new Allocation( AllocationSize );
  array_ = new ArrayReference<byte>( alloc_.Address, AllocationSize );
  managedArray_ = new byte[ AllocationSize ];
}

[Benchmark]
public void ManagedArray_ranged_for_get()
{
  var counter = 0;
  for ( var i = 0; i < AllocationSize; i++ )
    counter += managedArray_[ i ];
}

[Benchmark]
public void ArrayReference_ranged_for_get()
{
  var counter = 0;
  for ( var i = 0; i < AllocationSize; i++ )
    counter += array_[ i ];
}

[Benchmark]
public void ManagedArray_ranged_for_set()
{
  for ( var i = 0; i < AllocationSize; i++ )
    managedArray_[ i ] = ( byte ) i;
}

[Benchmark]
public void ArrayReference_ranged_for_set()
{
  for ( var i = 0; i < AllocationSize; i++ )
    array_[ i ] = ( byte ) i;
}

如您所见,从 an 读取ArrayReference速度稍快一些,因为它不执行范围检查,并且可以直接访问数组的指针。但是,写入ArrayReference比托管byte[]数组中的写入要慢,并且看起来问题在于设置器没有被内联。

托管字节 [] 集的 JIT x86:

managedArray_[ median ] = 0;
00007FFA237A216C  mov         rax,qword ptr [rbp+10h]  
00007FFA237A2170  mov         rax,qword ptr [rax+18h]  
00007FFA237A2174  mov         rdx,qword ptr [rbp+10h]  
00007FFA237A2178  mov         edx,dword ptr [rdx+24h]  
00007FFA237A217B  cmp         rdx,qword ptr [rax+8]  
00007FFA237A217F  jb          00007FFA237A2186  
00007FFA237A2181  call        00007FFA833DF110  
00007FFA237A2186  lea         rax,[rax+rdx+10h]  
00007FFA237A218B  mov         byte ptr [rax],0  

用于 ArrayReference::set 的 JIT x86:

typedPointer_[ index ] = value;
00007FFA237A20BC  mov         rcx,qword ptr [rbp+10h]  
00007FFA237A20C0  mov         rcx,qword ptr [rcx+10h]  
00007FFA237A20C4  mov         rdx,qword ptr [rbp+10h]  
00007FFA237A20C8  mov         edx,dword ptr [rdx+24h]  
00007FFA237A20CB  xor         r8d,r8d  
00007FFA237A20CE  cmp         dword ptr [rcx],ecx  
00007FFA237A20D0  call        00007FFA237A1738  
 -> 
    00007FFA237A20F0  push        rbp  
    00007FFA237A20F1  sub         rsp,20h  
    00007FFA237A20F5  lea         rbp,[rsp+20h]  
    00007FFA237A20FA  mov         qword ptr [rbp+10h],rcx  
    00007FFA237A20FE  mov         dword ptr [rbp+18h],edx  
    00007FFA237A2101  mov         dword ptr [rbp+20h],r8d  
    00007FFA237A2105  cmp         dword ptr [7FFA23688310h],0  
    00007FFA237A210C  je          00007FFA237A2113  
    00007FFA237A210E  call        00007FFA833DD3E0  
    00007FFA237A2113  mov         rax,qword ptr [rbp+10h]  
    00007FFA237A2117  mov         rax,qword ptr [rax+20h]  
    00007FFA237A211B  mov         edx,dword ptr [rbp+18h]  
    00007FFA237A211E  movsxd      rdx,edx  
    00007FFA237A2121  mov         ecx,1  
    00007FFA237A2126  movsxd      rcx,ecx  
    00007FFA237A2129  imul        rdx,rcx  
    00007FFA237A212D  mov         ecx,dword ptr [rbp+20h]  
    00007FFA237A2130  mov         byte ptr [rax+rdx],cl  

我不明白为什么这没有被内联。它做的事情与托管数组完全相同,除了指向非托管内存的指针。这是否违反了 CLR 的内联规则之一,或者它可能没有被内联,因为T它是通用的,即使它受到约束?

Windows 10 Pro,64 位 .Net Core 2.2,发布模式 64 位 RyuJIT

标签: c#performancegenericsunmanaged

解决方案


推荐阅读