首页 > 解决方案 > PowerShell 提高阵列更新性能

问题描述

我有两个包含用户详细信息的数组。

阵列 1

假设有大约 2000 个不同的用户,他们存在于“不同用户”数组中。不同的用户是一组用户对象,具有如下属性:

$userDetailsObject=@{
    UserName = "UserName1"
    UserEmail = "UserEmail1"
    Category = "Category1"

} 

阵列 2

第二个数组是属于多个组的所有用户的列表 - 名为“所有组用户”。因此可能存在重复用户,因为同一个用户可以是多个组的一部分。“所有组用户”数组没有用户的类别详细信息。该数组中有大约 12000 个对象。

$groupUserDetailsObject=@{
    UserName = "UserName1"
    UserEmail = "UserEmail1"
    GroupName = "Group1"
    Category = ""

}

我正在尝试使用类别获取组明智的用户详细信息,并且需要比较“所有组用户”数组和“不同用户”数组(基于唯一的电子邮件地址),并在“所有组用户”。

最快的方法是什么?我尝试了不同的组合,例如嵌套的 foreach 循环和正则表达式匹配,但无法大大加快速度,大约需要 20 秒才能遍历“不同用户”数组中的每个用户并更新“所有组”中的所有匹配对象用户”数组。

添加我用来测试的模拟代码:

$Users=[System.Collections.ArrayList]@()
$a = @(1..12000)
$i=0

#build 12000 sample users with 2000 unique users by adding sample values
foreach ($item in $a)
{   
    if($i -eq 2000) #reset pattern after every 200 users
    {
        $i=0
    }
    $userDetailsObject=@{
        UserName = "UserName$i"
        UserEmail = "UserEmail$i"
        Category  = ""

    } 
    $obj = New-Object -Type PSObject -Prop $userDetailsObject
    $Users.Add($obj)
    $i++
}

#loop through 2000 unique users and compare by email address against 12000 users
for($i=0 ;$i -lt 2000 ;$i++)
{
    [regex] $regex = '(?i)^('+[regex]::escape("UserEmail$i") + ')$'

    Measure-Command {foreach($item in ($Users.where{$_.UserEmail -match $regex}))
    {
        $item.Category  = "Category$i"
    }
}| Select-Object -Property TotalSeconds

}

谢谢。

标签: arraysperformancepowershell

解决方案


如果您需要快速比较,请使用 Hashtable 来存储用户对象。这意味着哈希中的每个项目都必须有一个唯一的 key,为此您可以使用 users 的 EmailAddress 属性(在网络中应该是唯一的每个用户)。

就像是:

Write-Host 'Creating dummy users'
# build a hashtable with 12000 sample users with unique email addresses (as in real life they would have)
$userHash = [ordered]@{}
foreach ($item in @(1..12000)) {
    $userHash["UserEmail$item"] = [PsCustomObject]@{
        UserName  = "UserName$item"
        UserEmail = "UserEmail$item"
        Category  = ""
    } 
}

Write-Host 'Setting Category on 2000 users'
Measure-Command {
    for($i = 0; $i -lt 2000; $i++) {
        if ($userHash.Contains("UserEmail$i")) {
            $userHash["UserEmail$i"].Category = "Category$i"
        }
    }
} | Select-Object -Property TotalSeconds

如果你想看结果

$userHash.Values | Select-Object * | Format-Table

更新

感谢您添加额外的解释。如果我理解正确,这就是你想要做的:

$distinctUsers = 2000
$allGroupUsers = 12000

Write-Host "Creating $distinctUsers DISTINCT users"
# build a hashtable with $distinctUsers sample users with unique email addresses
# this doesn't need to be an ordered list, it is just for lookup
$distinctHash = @{}
foreach ($i in @(1..$distinctUsers)) {
    $distinctHash["UserEmail$i"] = [PsCustomObject]@{
        UserName  = "UserName$i"
        UserEmail = "UserEmail$i"
        Category  = "Category$i"
    } 
}

Write-Host "Creating $allGroupUsers GROUP users"
# build a hashtable with $allGroupUsers sample users
# to have the output easier on the eyes, make this hash ordered
$groupHash = [ordered]@{}
for ($i = 1; $i -le $allGroupUsers; $i++) {
    # uniqify the key using the email addresses combined with the value of $i
    # reset the user counter on every $distinctUsers users
    $currentUser = (($i - 1) % $distinctUsers) + 1
    $uniqueEmail = '{0}-{1}' -f $i, "UserEmail$currentUser"
    $groupHash[$uniqueEmail] = [PsCustomObject]@{
        UserName  = "UserName$currentUser"
        UserEmail = "UserEmail$currentUser"
        GroupName = "Group$i"
        Category  = ""
    } 
}


Write-Host "Setting Category property on the $($groupHash.Count) GROUP users"
Measure-Command {
    foreach ($uniqueEmail in $groupHash.Keys) {
        $realEmail = ($uniqueEmail -split '-', 2)[-1]
        if ($distinctHash.ContainsKey($realEmail)) {
            $groupHash[$uniqueEmail].Category = $distinctHash[$realEmail].Category
        }
    }
} | Select-Object -Property TotalSeconds

在此之后,$groupHash 将被更新。
要查看结果:

$groupHash.Values | Select-Object * | Format-Table -AutoSize

推荐阅读