首页 > 解决方案 > 读取 CSV 文件时,Powershell 拦截并修复特定值

问题描述

在 PowerShell 脚本中,我读取了一个 CSV 文件。

我必须“修复”一些值。具体来说,CSV 可能包含一个空值,无论是字面意思NULL还是有时-. 所有这些值都被视为$null

有没有办法拦截 CSV 解析来处理它?

其实我有一个可行的解决方案,但解决方案非常。迭代 2500 多个项目需要 20 分钟,而按原样读取 CSV 文件只需几秒钟。

这个想法是迭代每个属性:

$private:result = @{}
foreach($private:prop in $private:line.PSObject.Properties){
    $private:value = $null
    $private:result.Add($private:prop.Name, ($private:value | Filter-Value))
}
$private:result
...

function Filter-Value{
    param(
        [Parameter(Position=0, ValueFromPipeline=$true)]
        [object]$In
    )

    if(-not $In){
        $null
    }
    elseif(($In -is [string]) -and ($In.Length -eq 0)) {
        $null
    }
    elseif(($In -eq "NULL") -or ($In -eq "-")) {
        $null
    }
    else{
        $In
    }
}

完整代码:


function Import-CsvEx{
    param(
        [Parameter(Mandatory=$true, Position=0, ValueFromPipeline=$true)]
        [ValidateScript({Test-Path $_ -PathType Leaf})]
        [string]$Path,
        [Parameter()]
        [string]$Delimiter
    )
    begin{
        Write-Verbose "Begin read of file $Path"
    }
    process{
        # We use file stream and stream reader to automatically detect encoding
        $private:fileStream = [System.IO.File]::OpenRead($Path)

        $private:streamReader = New-Object System.IO.StreamReader($private:fileStream, [System.Text.Encoding]::Default, $true)

        $private:fileContent = $private:streamReader.ReadToEnd()

        $private:streamReader.Dispose()
        $private:fileStream.Dispose()        

        $private:csv = ConvertFrom-Csv $private:fileContent  -Delimiter $Delimiter

        for($private:i=0; $private:i -lt $private:csv.Count ; $private:i++){
            Write-Progress -Id 1003 -Activity "Reading  CSV" -PercentComplete ($private:i*100/$private:csv.count)
            $private:line = $private:csv[$private:i]
            $private:result = @{}
            foreach($private:prop in $private:line.PSObject.Properties){
                $private:value = $null
                $private:result.Add($private:prop.Name, ($private:value | Filter-Value))
            }

            # actually outputs the object to the pipeline
            New-Object psobject -Property $private:result

        }
        Write-Progress -Id 1003 -Activity "Reading CSV" -Completed

    }
    end{
        Write-Verbose "End read of file $Path"
    }
}

function Filter-Value{
    param(
        [Parameter(Position=0, ValueFromPipeline=$true)]
        [object]$In
    )

    if(-not $In){
        $null
    }
    elseif(($In -is [string]) -and ($In.Length -eq 0)) {
        $null
    }
    elseif(($In -eq "NULL") -or ($In -eq "-")) {
        $null
    }
    else{
        $In
    }
}

标签: performancepowershellcsv

解决方案


鉴于性能是问题

  • 避免使用管道(尽管代价是必须将所有数据放入内存)。

  • 避免使用Write-Progress.

  • 避免通过. _.psobject.Properties

顺便说一句:$private:很少需要使用范围,这会使您的代码难以阅读;请注意,在函数内仅通过名称分配变量会隐式创建局部变量(例如,$var = 42);仅$private:当您需要明确阻止后代范围看到这些变量时才需要 - 请参阅此答案以获取更多信息。

# Import the CSV data into a collection in memory.
# NOTE: In Windows PowerShell, Import-Csv defaults to ASCII(!) encoding.
#       Use -Encoding Default to use the system's ANSI code page, for instance.
#       PowerShell [Core] 6+ consistently defaults to (BOM-less) UTF-8.
$objects = Import-Csv $Path -Delimiter $Delimiter

# Extract the property (column) names from the 1st imported object.
$propNames = $objects[0].psobject.Properties.Name

# Loop over all objects...
foreach ($object in $objects) {

  # ... and make the quasi-null properties $null.
  foreach ($propName in $propNames) {
    if ($object.$propName -in '', '-', 'NULL') {
      $object.$propName = $null
    }
  }

  # Output the modified object right away, if desired.
  # Alternatively, operate on the $objects collection later.
  $object

}

如果您无法将所有数据放入内存,请使用, 同时仍仅在脚本块 ( )的第一次Import-Csv ... | ForEach-Object { ... }调用中提取属性名称。{ ... }


推荐阅读