首页 > 解决方案 > 如何删除文件中结构相同但内容不同的模式?

问题描述

我使用了一个出现此模式的文件(.gff3)(其中 # 对应于数字):

TRINITY_DN###_c0_g1~~

例子:

BAN_TRINITY_DN0_c0_g1_i1        transdecoder    gene    1       580     .       +       .       ID=TRINITY_DN0_c0_g1~~TRINITY_DN0_c0_g1_i1.p1;Name=ORF%20type%3A5prime_partial%20len%3A190%20%28%2B%29%2Cscore%3D182.16
BAN_TRINITY_DN0_c0_g1_i1        transdecoder    mRNA    1       580     .       +       .       ID=TRINITY_DN0_c0_g1_i1.p1;Parent=TRINITY_DN0_c0_g1~~TRINITY_DN0_c0_g1_i1.p1;Name=ORF%20type%3A5prime_partial%20len%3A190%20%28%2B%29%2Cscore%3D182.16
BAN_TRINITY_DN0_c0_g1_i1        transdecoder    exon    1       580     .       +       .       ID=TRINITY_DN0_c0_g1_i1.p1.exon1;Parent=TRINITY_DN0_c0_g1_i1.p1
BAN_TRINITY_DN0_c0_g1_i1        transdecoder    CDS     1       570     .       +       0       ID=cds.TRINITY_DN0_c0_g1_i1.p1;Parent=TRINITY_DN0_c0_g1_i1.p1
BAN_TRINITY_DN0_c0_g1_i1        transdecoder    three_prime_UTR 571     580     .       +       .       ID=TRINITY_DN0_c0_g1_i1.p1.utr3p1;Parent=TRINITY_DN0_c0_g1_i1.p1

BAN_TRINITY_DN101_c0_g1_i1      transdecoder    gene    1       230     .       -       .       ID=TRINITY_DN101_c0_g1~~TRINITY_DN101_c0_g1_i1.p1;Name=ORF%20type%3Ainternal%20len%3A77%20%28-%29%2Cscore%3D24.09
BAN_TRINITY_DN101_c0_g1_i1      transdecoder    mRNA    1       230     .       -       .       ID=TRINITY_DN101_c0_g1_i1.p1;Parent=TRINITY_DN101_c0_g1~~TRINITY_DN101_c0_g1_i1.p1;Name=ORF%20type%3Ainternal%20len%3A77%20%28-%29%2Cscore%3D24.09
BAN_TRINITY_DN101_c0_g1_i1      transdecoder    exon    1       230     .       -       .       ID=TRINITY_DN101_c0_g1_i1.p1.exon1;Parent=TRINITY_DN101_c0_g1_i1.p1
BAN_TRINITY_DN101_c0_g1_i1      transdecoder    CDS     3       230     .       -       0       ID=cds.TRINITY_DN101_c0_g1_i1.p1;Parent=TRINITY_DN101_c0_g1_i1.p1

我想简单地删除模式,所以输出会是这样的:

BAN_TRINITY_DN0_c0_g1_i1        transdecoder    gene    1       580     .       +       .       ID=TRINITY_DN0_c0_g1_i1.p1;Name=ORF%20type%3A5prime_partial%20len%3A190%20%28%2B%29%2Cscore%3D182.16
BAN_TRINITY_DN0_c0_g1_i1        transdecoder    mRNA    1       580     .       +       .       ID=TRINITY_DN0_c0_g1_i1.p1;Parent=TRINITY_DN0_c0_g1_i1.p1;Name=ORF%20type%3A5prime_partial%20len%3A190%20%28%2B%29%2Cscore%3D182.16
BAN_TRINITY_DN0_c0_g1_i1        transdecoder    exon    1       580     .       +       .       ID=TRINITY_DN0_c0_g1_i1.p1.exon1;Parent=TRINITY_DN0_c0_g1_i1.p1
BAN_TRINITY_DN0_c0_g1_i1        transdecoder    CDS     1       570     .       +       0       ID=cds.TRINITY_DN0_c0_g1_i1.p1;Parent=TRINITY_DN0_c0_g1_i1.p1
BAN_TRINITY_DN0_c0_g1_i1        transdecoder    three_prime_UTR 571     580     .       +       .       ID=TRINITY_DN0_c0_g1_i1.p1.utr3p1;Parent=TRINITY_DN0_c0_g1_i1.p1

BAN_TRINITY_DN101_c0_g1_i1      transdecoder    gene    1       230     .       -       .       ID=TRINITY_DN101_c0_g1_i1.p1;Name=ORF%20type%3Ainternal%20len%3A77%20%28-%29%2Cscore%3D24.09
BAN_TRINITY_DN101_c0_g1_i1      transdecoder    mRNA    1       230     .       -       .       ID=TRINITY_DN101_c0_g1_i1.p1;Parent=TRINITY_DN101_c0_g1_i1.p1;Name=ORF%20type%3Ainternal%20len%3A77%20%28-%29%2Cscore%3D24.09
BAN_TRINITY_DN101_c0_g1_i1      transdecoder    exon    1       230     .       -       .       ID=TRINITY_DN101_c0_g1_i1.p1.exon1;Parent=TRINITY_DN101_c0_g1_i1.p1
BAN_TRINITY_DN101_c0_g1_i1      transdecoder    CDS     3       230     .       -       0       ID=cds.TRINITY_DN101_c0_g1_i1.p1;Parent=TRINITY_DN101_c0_g1_i1.p1

但是,我尝试使用sed它来执行此操作,因为模式在大小和组成方面发生了变化,并且我不知道如何通过考虑到这一点来执行字符删除(我对使用 bash 还是很陌生)。

有人知道该怎么做吗???

标签: sedreplace

解决方案


如果您同意根据正则表达式的语法编写要删除的模式,只需发出:

PATTERN='TRINITY_DN[0-9][0-9][0-9]_c0_g1~~'
sed "/$PATTERN/s///g" file.gff3

我认为这种模式可能会在一条线上出现多次。如果不是这种情况,请删除命令g第一个参数末尾的。sed

如果你不知道你后面会有多少位数TRINITY_DN,你可以用[0-9][0-9][0-9]代替[0-9]*

如果您想要另一种语法来描述您的模式(v.gr.#而不是[0-9]),请指定。


推荐阅读