首页 > 解决方案 > 使用 Powershell 编辑制表符分隔的文本文件中的列

问题描述

我有一个非常大的(~250k 行和 171 列)制表符分隔的文本文件,我需要编辑。我需要在每一行的第三列添加字母“H”。

所以我需要它从 03/20/2020 09:00 03/20/2020 10:00 1269805 ...... 到 03/20/2020 09:00 03/20/2020 10:00 H1269805 .. ...我实际上使用以下代码进行了此操作:

$source = Get-ChildItem "C:\test\input\*.txt"
$target = "C:\test\test.txt"

$data = Get-Content -Path $source | ConvertFrom-Csv -Delimiter "`t" -Header Column1, Column2, Column3, Column4, Column5, Column6, Column7, Column8, Column9, Column10, Column11, Column12, Column13, Column14, Column15, Column16, Column17, Column18, Column19, Column20,
Column21, Column22, Column23, Column24, Column25, Column26, Column27, Column28, Column29, Column30, Column31, Column32, Column33, Column34, Column35, Column36, Column37, Column38, Column39, Column40,
Column41, Column42, Column43, Column44, Column45, Column46, Column47, Column48, Column49, Column50, Column51, Column52, Column53, Column54, Column55, Column56, Column57, Column58, Column59, Column60,
Column61, Column62, Column63, Column64, Column65, Column66, Column67, Column68, Column69, Column70, Column71, Column72, Column73, Column74, Column75, Column76, Column77, Column78, Column79, Column80,
Column81, Column82, Column83, Column84, Column85, Column86, Column87, Column88, Column89, Column90, Column91, Column92, Column93, Column94, Column95, Column96, Column97, Column98, Column99, Column100,
Column101, Column102, Column103, Column104, Column105, Column106, Column107, Column108, Column109, Column110, Column111, Column112, Column113, Column114, Column115, Column116, Column117, Column118, Column119, Column120, 
Column121, Column122, Column123, Column124, Column125, Column126, Column127, Column128, Column129, Column130, Column131, Column132, Column133, Column134, Column135, Column136, Column137, Column138, Column139, Column140,
Column141, Column142, Column143, Column144, Column145, Column146, Column147, Column148, Column149, Column150, Column151, Column152, Column153, Column154, Column155, Column156, Column157, Column158, Column159, Column160,
Column161, Column162, Column163, Column164, Column165, Column166, Column167, Column168, Column169, Column170, Column171

$data | % {
    If ($_.Column3) {
        #import ID
        $_.Column3 = "H$($_.Column3)"
    } }

$data | Select Column1, Column2, Column3, Column4, Column5, Column6, Column7, Column8, Column9, Column10, Column11, Column12, Column13, Column14, Column15, Column16, Column17, Column18, Column19, Column20,
Column21, Column22, Column23, Column24, Column25, Column26, Column27, Column28, Column29, Column30, Column31, Column32, Column33, Column34, Column35, Column36, Column37, Column38, Column39, Column40,
Column41, Column42, Column43, Column44, Column45, Column46, Column47, Column48, Column49, Column50, Column51, Column52, Column53, Column54, Column55, Column56, Column57, Column58, Column59, Column60,
Column61, Column62, Column63, Column64, Column65, Column66, Column67, Column68, Column69, Column70, Column71, Column72, Column73, Column74, Column75, Column76, Column77, Column78, Column79, Column80,
Column81, Column82, Column83, Column84, Column85, Column86, Column87, Column88, Column89, Column90, Column91, Column92, Column93, Column94, Column95, Column96, Column97, Column98, Column99, Column100,
Column101, Column102, Column103, Column104, Column105, Column106, Column107, Column108, Column109, Column110, Column111, Column112, Column113, Column114, Column115, Column116, Column117, Column118, Column119, Column120, 
Column121, Column122, Column123, Column124, Column125, Column126, Column127, Column128, Column129, Column130, Column131, Column132, Column133, Column134, Column135, Column136, Column137, Column138, Column139, Column140,
Column141, Column142, Column143, Column144, Column145, Column146, Column147, Column148, Column149, Column150, Column151, Column152, Column153, Column154, Column155, Column156, Column157, Column158, Column159, Column160,
Column161, Column162, Column163, Column164, Column165, Column166, Column167, Column168, Column169, Column170, Column171 | ConvertTo-Csv -Delimiter "`t" -NoTypeInformation | % { $_ -replace '"', "" } | Select-Object -Skip 1 | Set-Content -Path $target

我的问题是需要很长时间。我知道这是一个大文件,但有没有其他方法可以更快地做到这一点?我觉得与 CSV 之间的转换是最耗时的,但我可能错了。整个过程大约需要 25 分钟才能完成。任何帮助都会很棒。

标签: powershellcsv

解决方案


为了加快处理速度,避免使用管道,对文件 I/O 使用 .NET 类型并使用纯文本操作:

# Create the output file.
$outFile = [IO.File]::CreateText($target)
# Loop over all input files
foreach ($file in Get-ChildItem C:\test\input\*.txt) {   
  # Loop over a given file's lines.
  foreach ($line in [IO.File]::ReadLines($file.FullName)) {
    # Prepend 'H' to the 3rd column and append to the output file.
    $outFile.WriteLine(($line -replace '^.*?\t.*?\t', '$&H'))
  }
}
$outFile.Close()

笔记:

  • 请务必始终将完整文件路径传递给 .NET 方法,因为 .NET 的工作目录通常与 PowerShell 的不同。

  • .NET 文件 I/O 方法默认为无 BOM 的 UTF-8 编码。

  • 使用 PowerShell的基于正则表达式的运算符H插入到第三个制表符分隔列的前面。-replace


推荐阅读