awk - grep 无法从 CSV 文件中删除模式

问题描述

我有一个文件也需要清除一些 URL。URL 位于文件 A 和 CSV 文件 B 中（这些是大小为 6-10 GB 的大文件）。我尝试了以下 grep 命令，但它不适用于较新的 fileB。

grep -vwF -f patterns.txt fileB.csv > result.csv

文件 A 的结构是一个 URL 列表，如下所示：

URLs (header, single column)
bwin.hu
paradisepoker.li

和文件B：

type|||URL|||Date|||Domain
1|||https://www.google.com|||1524024000|||google.com 
2|||www.bwin.hu|||1524024324|||bwin.hu

fileB 的分隔符是 |||

我对包括 awk 在内的所有解决方案持开放态度。谢谢。

编辑：预期输出是 CSV 文件，保留与 fileA 中的域模式不匹配的所有行

type|||URL|||Date|||Domain
1|||https://www.google.com|||1524024000|||google.com

标签： awkgrep

请您尝试以下操作。

awk 'FNR==NR{a[$0];next} !($NF in a)' Input_filea FS="\\|\\|\\|" Input_fileb

或者

awk 'FNR==NR{a[$0];next} !($NF in a)' filea FS='\|\|\|' fileb

输出如下。

type|||URL|||Date|||Domain
1|||https://www.google.com|||1524024000|||google.com

说明：现在为上述代码添加说明。

awk '                                          ##Starting awk program here.
FNR==NR{                                       ##Checking condition FNR==NR which will be TRUE when first Input_file named filea is being read.
  a[$0]                                        ##Creating an array named a whose index is $0(current line).
  next                                         ##next keyword will skip all further statements.
}                                              ##Closing block for condition FNR==NR here.
!($NF in a)                                    ##Checking condition if last field of current line is NOT present in array a for Input_fileb only.
                                               ##if condition is TRUE then no action is mentioned so by default print of current line will happen.
' filea FS="\\|\\|\\|" fileb                   ##Mentioning Input_file names and for fileb mentioning FS should be ||| escaped it here so that awk will consider it as a literal character.

awk - grep 无法从 CSV 文件中删除模式

问题描述

解决方案

推荐阅读