首页 > 解决方案 > Powershell删除重复行与其他行相同

问题描述

在powershell中,当它们开始类似时,我想删除txt文件中的重复行:

https://mysite.local/9999/9999_01_00.jpg?Watchout=1588338564&User=oj-e39DOyiUJCjtG3E2DWaeT8Q8_&dominus=3PK5GF6789
https://mysite.local/9999/9999_01_01.jpg?Watchout=1182344561&User=EjHBJ-biGSlM-ewPMVs_&dominus=3PK5GF6789
https://mysite.local/9999/9999_01_01.jpg?Watchout=1182344561&User=EjHBJ-biGSlM-ewPMVs_&dominus=3PK5GF6789
https://mysite.local/9999/9999_01_01.jpg?Watchout=1182344561&User=EjHBJ-biGSlM-ewPMVs_&dominus=3PK5GF6789
https://mysite.local/9999/9999_01_02.jpg?Watchout=1182344561&User=IPElkKyuulUYY1AL~~y4Y-HedKarGntAexw14_&dominus=3PK5GF6789
https://mysite.local/9999/9999_01_02.jpg?Watchout=1182344561&User=IPElkKyuulUYY1AL~~y4Y-HedKarGntAexw14_&dominus=3PK5GF6789
https://mysite.local/9999/9999_01_02.jpg?Watchout=1182344561&User=IPElkKyuulUYY1AL~~y4Y-HedKarGntAexw14_&dominus=3PK5GF6789
https://mysite.local/9999/9999_02_00.jpg?Watchout=1182344561&User=VIybnoLd8cthJ7MfsFM6EfD3M_&dominus=3PK5GF6789

应该在我的 txt 文件中:

https://mysite.local/9999/9999_01_00.jpg?Watchout=1588338564&User=oj-e39DOyiUJCjtG3E2DWaeT8Q8_&dominus=3PK5GF6789
https://mysite.local/9999/9999_01_01.jpg?Watchout=1182344561&User=EjHBJ-biGSlM-ewPMVs_&dominus=3PK5GF6789
https://mysite.local/9999/9999_01_02.jpg?Watchout=1182344561&User=IPElkKyuulUYY1AL~~y4Y-HedKarGntAexw14_&dominus=3PK5GF6789
https://mysite.local/9999/9999_02_00.jpg?Watchout=1182344561&User=VIybnoLd8cthJ7MfsFM6EfD3M_&dominus=3PK5GF6789

因为我数了 3 次: https://mysite.local/9999/9999_01_01.jpg,所以每个 jpg 文件只需要一次:9999_01_00.jpg

我该怎么做?删除重复的行不是很难,但是当行不完全相似时,就有点困难了!

谢谢 !

标签: stringpowershellfileduplicates

解决方案


一种方法是:

$newText = Get-Content -Path 'X:\FileWithUrls.txt' |
            Group-Object @{Expression = { ($_ -split '\?')[0]}} | 
            ForEach-Object { $_.Group[0] }

# output on screen
$newText

# output to new text file
$newText | Set-Content -Path 'X:\DeDupedFileWithUrls.txt' -Force

输出:

https://mysite.local/9999/9999_01_00.jpg?Watchout=1588338564&User=oj-e39DOyiUJCjtG3E2DWaeT8Q8_&dominus=3PK5GF6789
https://mysite.local/9999/9999_01_01.jpg?Watchout=1182344561&User=EjHBJ-biGSlM-ewPMVs_&dominus=3PK5GF6789
https://mysite.local/9999/9999_01_02.jpg?Watchout=1182344561&User=IPElkKyuulUYY1AL~~y4Y-HedKarGntAexw14_&dominus=3PK5GF6789
https://mysite.local/9999/9999_02_00.jpg?Watchout=1182344561&User=VIybnoLd8cthJ7MfsFM6EfD3M_&dominus=3PK5GF6789


小解释:

Group-Object文本文件中的相似行组合在一起。这是基于我们提供的属性,在这种
情况下,它是字符串?.

为此,我们只需在问号上拆分字符串并仅使用第一部分[0]

因为-split使用正则表达式,我们需要用\?

这被包裹在一个计算的(即时)属性中@{Expression = { ($_ -split '\?')[0]}}

接下来我们遍历组并仅输出每个组的第一项


推荐阅读