首页 > 解决方案 > 过滤多个 CSV 文本并创建新文件

问题描述

我有大约 2500 个 CSV 文件,每个文件大小约为 20MB。我正在尝试从每个文件中过滤掉某些行并将其保存到一个新文件中。

所以,如果我有:

File 1 :
    Row1
    Row2
    Row3
File 2 : 
    Row2
    Row3 
and so on..

如果我过滤所有文件并选择“Row2”作为过滤文本,则新文件夹应该包含所有文件,其中只有与过滤文本匹配的行。

浏览一些论坛,我想出了以下可能帮助我过滤行的方法,但我不确定如何递归地做到这一点,另外我也不知道这是否是一种足够快的方法。任何帮助表示赞赏。

Get-Content "C:\Path to file" | Where{$_ -match "Rowfiltertext*"} | Out-File "Path to Out file"

我正在使用 Windows,所以我猜 Powershell 类型的解决方案在这里是最好的。

要过滤的文本将始终位于第一列。

谢谢悉达特

标签: powershellcsv

解决方案


这是在(文本)文件中搜索字符串的两种快速方法:

1)使用开关

$searchPattern = [regex]::Escape('Rowfiltertext')  # for safety escape regex special characters
$sourcePath    = 'X:\Path\To\The\Csv\Files'
$outputPath    = 'X:\FilteredCsv.txt'

# if you also need to search inside subfolders, append -Recurse to the Get-ChildItem cmdlet
Get-ChildItem -Path $sourcePath -Filter '*.csv' -File | ForEach-Object {
    # iterate through the lines in the file and output the ones that match the search pattern
    switch -Regex -File $_.FullName {
        $searchPattern { $_ }
    }
} | Set-Content -Path $outputPath  # add -PassThru to also show on screen

2)使用选择字符串

$searchPattern = [regex]::Escape('Rowfiltertext')  # for safety escape regex special characters
$sourcePath    = 'X:\Path\To\The\Csv\Files'
$outputPath    = 'X:\FilteredCsv.txt'

# if you also need to search inside subfolders, append -Recurse to the Get-ChildItem cmdlet
Get-ChildItem  -Path $sourcePath -Filter '*.csv' -File | ForEach-Object {
    ($_ | Select-String -Pattern $searchPattern).Line
} | Set-Content -Path $outputPath  # add -PassThru to also show on screen

如果您想为每个原始文件输出一个新的 csv 文件,

利用:

3)使用开关

$searchPattern = [regex]::Escape('Rowfiltertext')  # for safety escape regex special characters
$sourcePath    = 'X:\Path\To\The\Csv\Files'
$outputPath    = 'X:\FilteredCsv'

if (!(Test-Path -Path $outputPath -PathType Container)) {
    $null = New-Item -Path $outputPath -ItemType Directory
}

# if you also need to search inside subfolders, append -Recurse to the Get-ChildItem cmdlet
(Get-ChildItem -Path $sourcePath -Filter '*.csv' -File) | ForEach-Object {
    # create a full target filename for the filtered output csv
    $outFile = Join-Path -Path $outputPath -ChildPath ('New_{0}' -f $_.Name)
    # iterate through the lines in the file and output the ones that match the search pattern
    $result = switch -Regex -File $_.FullName {
        $searchPattern { $_ }
    }
    $result | Set-Content -Path $outFile  # add -PassThru to also show on screen
}

4)使用选择字符串

$searchPattern = [regex]::Escape('Rowfiltertext')  # for safety escape regex special characters
$sourcePath    = 'X:\Path\To\The\Csv\Files'
$outputPath    = 'X:\FilteredCsv'

# if you also need to search inside subfolders, append -Recurse to the Get-ChildItem cmdlet
(Get-ChildItem  -Path $sourcePath -Filter '*.csv' -File) | ForEach-Object {
    # create a full target filename for the filtered output csv
    $outFile = Join-Path -Path $outputPath -ChildPath ('New_{0}' -f $_.Name)
    ($_ | Select-String -Pattern $searchPattern).Line | Set-Content -Path $outFile  # add -PassThru to also show on screen
}

希望有帮助


推荐阅读