首页 > 解决方案 > grep 排除模式并排除前 2 行

问题描述

我有一个文件,想使用 grep 排除模式。但我也想删除每场比赛的前两行(排除在外)。我该怎么做呢?

我试过的:

cat file.txt
Sequence: MG719312_IGHV1-8*03_Homosapiens_F_V-REGION_127..422_296nt_1_____296+0=296___     from: 1   to: 296
  Start     End  Strand Pattern                 Mismatch Sequence
    217     225       + pattern:AA[CT]NNN[AT]CN        . aacacctcc
Sequence: M99648_IGHV2-26*01_Homosapiens_F_V-REGION_164..464_301nt_1_____301+0=301___     from: 1   to: 301
  Start     End  Strand Pattern                 Mismatch Sequence
    176     184       + pattern:AA[CT]NNN[AT]CN        . aatcctaca

# With grep -v I can remove the line with pattern

grep -v "[acgt]\{3\}cc[acgt][acgt]\{3\}" file.txt
Sequence: MG719312_IGHV1-8*03_Homosapiens_F_V-REGION_127..422_296nt_1_____296+0=296___ from: 1 to: 296
Start End Strand Pattern Mismatch Sequence
217 225 + pattern:AA[CT]NNN[AT]CN . aacacctcc
Sequence: M99648_IGHV2-26*01_Homosapiens_F_V-REGION_164..464_301nt_1_____301+0=301___ from: 1 to: 301
Start End Strand Pattern Mismatch Sequence

# But using -B 2 does not work here

grep -B 2 -v "[acgt]\{3\}cc[acgt][acgt]\{3\}" file.txt
Sequence: MG719312_IGHV1-8*03_Homosapiens_F_V-REGION_127..422_296nt_1_____296+0=296___ from: 1 to: 296
Start End Strand Pattern Mismatch Sequence
217 225 + pattern:AA[CT]NNN[AT]CN . aacacctcc
Sequence: M99648_IGHV2-26*01_Homosapiens_F_V-REGION_164..464_301nt_1_____301+0=301___ from: 1 to: 301
Start End Strand Pattern Mismatch Sequence

任何想法如何为每场比赛删除前两行?

标签: bashgrep

解决方案


经测试GNU sed,语法/功能可能因其他实现而异

sed -E 'N;N; /[acgt]{3}cc[acgt][acgt]{3}/d' ip.txt
  • -E使用 ERE,一些 sed 版本需要,-r而不是-E
  • N;N将另外两行追加到模式空间
  • /[acgt]{3}cc[acgt][acgt]{3}/d如果此条件匹配,则删除
    • 请注意,这将尝试匹配三行中任何位置的正则表达式......也[acgt][acgt]{3}可以简化为[acgt]{4}
    • /\n.*\n.*[acgt]{3}cc[acgt][acgt]{3}/d将限制为仅匹配第 3 行

推荐阅读