首页 > 解决方案 > 是否可以将 ForEach-Objects 转换为 Jobs?

问题描述

我有一个将 CSV 数据转换为 XML 的 powershell 脚本。CSV 文件中有大约 11k 行。我的脚本中的过程大约需要 3-5 分钟才能将所有行转换为 xml 字符串,并且计算机仅使用 30% 的 cpu。有没有更有效的方法来做到这一点?我对谷歌做了一些研究,我一直在阅读有关工作的信息,但我不明白如何使用它们,或者这是否对我有用。

我已经测试了亲和力和优先级,当我将其设置为最高级别时,性能没有变化。

输出的格式符合我正在加载的系统所需的格式。所以理想情况下,我不想只改变循环部分的输出。

$OrderFile = "FileLocation.csv"
$OutputXML = "OutputFileLocation.xml"
$OrderData = Import-Csv -path $OrderFile

#Create a string template for the file and map fields for data.
$TemplateOuter = @'
<service_orders>$($xml)</service_orders>
'@

$TemplateCust = @'
<service_order><account_name></account_name><route_group></route_group><order_number>$($item.order_num)</order_number><delivery_type>$($item.delivery_type)</delivery_type><customer_code>$($item.customer_code)</customer_code><delivery_date>$($item.delivery_date)</delivery_date><cod>$($item.cod)</cod><service_time></service_time><note></note><line_items>$($items)</line_items></service_order>
'@

$TemplateItems = @'
<line_item><serial_number>$($item.sku)</serial_number><quantity>$($item.quantity)</quantity><amount>$($item.amount)</amount><description>$($item.description)</description><item_sequence></item_sequence><line_item_notes></line_item_notes><line_taxes>$($item.line_taxes)</line_taxes><line_amount></line_amount><category_name>$($item.category_name)</category_name><size_1>$($item.size_1)</size_1><size_2>$($item.size_2)</size_2><size_3>$($item.size_3)</size_3></line_item>
'@

#Loop through each of the orders and expand the order templates
$xml = $OrderData | Group-Object order_num -ov grp | ForEach-Object { 
            $items = foreach ($item in $_.Group) {
                $ExecutionContext.InvokeCommand.ExpandString($TemplateItems)
            }
        $ExecutionContext.InvokeCommand.ExpandString($TemplateCust)
        } | foreach {$_ -replace '&', '+'}

$xml = $ExecutionContext.InvokeCommand.ExpandString($TemplateOuter)

#Create the XML File
$xml |Out-File $OutputXML

$OrdersXML = Get-Content $OutputXML
$Utf8NoBomEncoding  = New-Object System.Text.UTF8Encoding $false
[System.IO.File]::WriteAllLines($OutputXML, $OrdersXML, $Utf8NoBomEncoding)

我的输出符合我的需求,但正如我所说,CPU 使用率最高为 30%,仅使用 4 个内核中的 1 个,并且对于 11k 行,该过程大约需要 3-5 分钟。

2019 年 11 月 1 日所以我尝试了以下操作

$xml = $OrderData | Group-Object order_num -ov grp
$xml= ForEach-Object { 
                        Start-Job -Name $_.Group -ScriptBlock {
                            $items = foreach ($item in $xml.Group) 
                            {
                                $ExecutionContext.InvokeCommand.ExpandString($TemplateItems)
                            }
                            $ExecutionContext.InvokeCommand.ExpandString($TemplateCust) | foreach {$_ -replace '&', '+'} 
                            } -ArgumentList $_.Group
               }

我看到一份工作,它说 hasmore data = true 但是当我收到这份工作时,那里什么都没有......

标签: powershell

解决方案


这是一个如何在后台与作业并行处理数据的示例。

[System.Int16]$MaxRunningJobs = 10 #How many jobs should run at the same time
[System.Collections.ArrayList]$JobsToDo = @(1..20) #The input for the jobs --> data to process
[System.String[]]$JobOutput = @() #Output from the jobs

#This script will run inside the job and takes some input to process
[System.Management.Automation.ScriptBlock]$WorkToDo = {
    param
    (
        [System.Int16]$JobInput
    )

    #Do your work
    Start-Sleep -Seconds (Get-Random -Minimum 1 -Maximum 10)

    #Return output
    return ('Job ' + $JobInput + ' finished!')
}

#Process each job from the JobsToDo ArrayList and waits until every job is Completed & removed
while (($JobsToDo) -or (Get-Job))
{
    #Start jobs
    while (($JobsToDo) -and ((Get-Job -State Running).Count -lt $MaxRunningJobs))
    {
        #Start new job
        Start-Job -ScriptBlock $WorkToDo -ArgumentList $JobsToDo

        #Remove started job from job list
        $JobsToDo.Remove($JobsToDo[0])
    }

    #Receive jobs
    foreach ($CompletedJob in (Get-Job -State Completed))
    {
        #Receive job output
        $JobOutput += Receive-Job -Job $CompletedJob

        #Remove job
        Remove-Job -Job $CompletedJob
    }
}

#Show job output
$JobOutput

推荐阅读