sed - 如何删除文件中结构相同但内容不同的模式?
问题描述
我使用了一个出现此模式的文件(.gff3)(其中 # 对应于数字):
TRINITY_DN###_c0_g1~~
例子:
BAN_TRINITY_DN0_c0_g1_i1 transdecoder gene 1 580 . + . ID=TRINITY_DN0_c0_g1~~TRINITY_DN0_c0_g1_i1.p1;Name=ORF%20type%3A5prime_partial%20len%3A190%20%28%2B%29%2Cscore%3D182.16
BAN_TRINITY_DN0_c0_g1_i1 transdecoder mRNA 1 580 . + . ID=TRINITY_DN0_c0_g1_i1.p1;Parent=TRINITY_DN0_c0_g1~~TRINITY_DN0_c0_g1_i1.p1;Name=ORF%20type%3A5prime_partial%20len%3A190%20%28%2B%29%2Cscore%3D182.16
BAN_TRINITY_DN0_c0_g1_i1 transdecoder exon 1 580 . + . ID=TRINITY_DN0_c0_g1_i1.p1.exon1;Parent=TRINITY_DN0_c0_g1_i1.p1
BAN_TRINITY_DN0_c0_g1_i1 transdecoder CDS 1 570 . + 0 ID=cds.TRINITY_DN0_c0_g1_i1.p1;Parent=TRINITY_DN0_c0_g1_i1.p1
BAN_TRINITY_DN0_c0_g1_i1 transdecoder three_prime_UTR 571 580 . + . ID=TRINITY_DN0_c0_g1_i1.p1.utr3p1;Parent=TRINITY_DN0_c0_g1_i1.p1
BAN_TRINITY_DN101_c0_g1_i1 transdecoder gene 1 230 . - . ID=TRINITY_DN101_c0_g1~~TRINITY_DN101_c0_g1_i1.p1;Name=ORF%20type%3Ainternal%20len%3A77%20%28-%29%2Cscore%3D24.09
BAN_TRINITY_DN101_c0_g1_i1 transdecoder mRNA 1 230 . - . ID=TRINITY_DN101_c0_g1_i1.p1;Parent=TRINITY_DN101_c0_g1~~TRINITY_DN101_c0_g1_i1.p1;Name=ORF%20type%3Ainternal%20len%3A77%20%28-%29%2Cscore%3D24.09
BAN_TRINITY_DN101_c0_g1_i1 transdecoder exon 1 230 . - . ID=TRINITY_DN101_c0_g1_i1.p1.exon1;Parent=TRINITY_DN101_c0_g1_i1.p1
BAN_TRINITY_DN101_c0_g1_i1 transdecoder CDS 3 230 . - 0 ID=cds.TRINITY_DN101_c0_g1_i1.p1;Parent=TRINITY_DN101_c0_g1_i1.p1
我想简单地删除模式,所以输出会是这样的:
BAN_TRINITY_DN0_c0_g1_i1 transdecoder gene 1 580 . + . ID=TRINITY_DN0_c0_g1_i1.p1;Name=ORF%20type%3A5prime_partial%20len%3A190%20%28%2B%29%2Cscore%3D182.16
BAN_TRINITY_DN0_c0_g1_i1 transdecoder mRNA 1 580 . + . ID=TRINITY_DN0_c0_g1_i1.p1;Parent=TRINITY_DN0_c0_g1_i1.p1;Name=ORF%20type%3A5prime_partial%20len%3A190%20%28%2B%29%2Cscore%3D182.16
BAN_TRINITY_DN0_c0_g1_i1 transdecoder exon 1 580 . + . ID=TRINITY_DN0_c0_g1_i1.p1.exon1;Parent=TRINITY_DN0_c0_g1_i1.p1
BAN_TRINITY_DN0_c0_g1_i1 transdecoder CDS 1 570 . + 0 ID=cds.TRINITY_DN0_c0_g1_i1.p1;Parent=TRINITY_DN0_c0_g1_i1.p1
BAN_TRINITY_DN0_c0_g1_i1 transdecoder three_prime_UTR 571 580 . + . ID=TRINITY_DN0_c0_g1_i1.p1.utr3p1;Parent=TRINITY_DN0_c0_g1_i1.p1
BAN_TRINITY_DN101_c0_g1_i1 transdecoder gene 1 230 . - . ID=TRINITY_DN101_c0_g1_i1.p1;Name=ORF%20type%3Ainternal%20len%3A77%20%28-%29%2Cscore%3D24.09
BAN_TRINITY_DN101_c0_g1_i1 transdecoder mRNA 1 230 . - . ID=TRINITY_DN101_c0_g1_i1.p1;Parent=TRINITY_DN101_c0_g1_i1.p1;Name=ORF%20type%3Ainternal%20len%3A77%20%28-%29%2Cscore%3D24.09
BAN_TRINITY_DN101_c0_g1_i1 transdecoder exon 1 230 . - . ID=TRINITY_DN101_c0_g1_i1.p1.exon1;Parent=TRINITY_DN101_c0_g1_i1.p1
BAN_TRINITY_DN101_c0_g1_i1 transdecoder CDS 3 230 . - 0 ID=cds.TRINITY_DN101_c0_g1_i1.p1;Parent=TRINITY_DN101_c0_g1_i1.p1
但是,我尝试使用sed
它来执行此操作,因为模式在大小和组成方面发生了变化,并且我不知道如何通过考虑到这一点来执行字符删除(我对使用 bash 还是很陌生)。
有人知道该怎么做吗???
解决方案
如果您同意根据正则表达式的语法编写要删除的模式,只需发出:
PATTERN='TRINITY_DN[0-9][0-9][0-9]_c0_g1~~'
sed "/$PATTERN/s///g" file.gff3
我认为这种模式可能会在一条线上出现多次。如果不是这种情况,请删除命令g
第一个参数末尾的。sed
如果你不知道你后面会有多少位数TRINITY_DN
,你可以用[0-9][0-9][0-9]
代替[0-9]*
。
如果您想要另一种语法来描述您的模式(v.gr.#
而不是[0-9]
),请指定。
推荐阅读
- css - 根据文本方向属性提供不同的 SCSS 文件
- java - 如何将 else false 添加到布尔方法
- android - 如何在for循环中添加arraylist?
- mysql - mysql - 如何根据 3 个表对特定数据进行 SUM()
- ethereum - Uncaught Error: Invalid Address web3.currentProvider MetaMask
- ruby-on-rails - 如何将 ERB 加载到 iframe 中?
- android - YouTubeService 泄露了 IntentReceiver
- keras - Keras 特征提取 - 预期 input_1 有 4 个维度,但得到了形状为 (1, 46) 的数组
- jar - 如何在 Java 规则中设置 Bazel --warn_duplicate_resources 标志?
- java - JAX-WS call does not work when calling Java from Javascript using JXBrowser