首页 > 解决方案 > 如何将包含文件名和信息的文件分别拆分为多个文件?

问题描述

我有一个file.txt看起来像这样的(我删除了行以简化我的示例):

PLXNA3                                                                                     ### <- filename1
Missense/nonsense : 13 mutations                                                           # <- header spaces
accession   codon_change    amino_acid_change                                              # <- column names tsv
ID73        CAT-TAT         His66Tyr                                                       # <- line tsv
ID63        GAC-AAC         Asp127Asn                                                      # <- line tsv
ID31        GCC-GTC         Ala307Val                                                      # <- line tsv
NEDD4L                                                                                     ### <- filename2
Splicing : 1 mutation                                                                      # <- header spaces
accession      splicing_mutation                                                           # <- column names tsv
ID51           IVS1 as G-A -16229                                                          # <-  line tsv
Gross deletions : 1 mutation                                                               # <- header spaces
accession   DNA_level   description                 HGVS_(nucleotide)   HGVS_(protein)     # <- column names tsv
ID853       gDNA        4.5 Mb incl. entire gene    Not yet available   Not yet available  # <- line tsv
OPHN1                                                                                      ### <- filename3
Small insertions : 3 mutations                                                             # <- header spaces
accession         insertion                            HGVS_(nucleotide)                   # <- column names tsv
ID96          TTATGTT(^183)TATtCAAATCCAGG c.549dupT    p.(Gln184Serfs*23)                  # <- line tsv
ID25          GTGCT(^310)AAGCAcaG_EI_GTCAGTTCT         c.931_932dupCA                      # <- line tsv

我想拆分此文件以获得 3 个不同的文件:

PLXNA3.txt

PLXNA3                                                                                     ### <- filename1
Missense/nonsense : 13 mutations                                                           # <- header spaces
accession   codon_change    amino_acid_change                                              # <- column names tsv
ID73        CAT-TAT         His66Tyr                                                       # <- line tsv
ID63        GAC-AAC         Asp127Asn                                                      # <- line tsv
ID31        GCC-GTC         Ala307Val                                                      # <- line tsv

NEDD4L.txt

NEDD4L                                                                                     ### <- filename2
Splicing : 1 mutation                                                                      # <- header spaces
accession      splicing_mutation                                                           # <- column names tsv
ID51           IVS1 as G-A -16229                                                          # <-  line tsv
Gross deletions : 1 mutation                                                               # <- header spaces
accession   DNA_level   description                 HGVS_(nucleotide)   HGVS_(protein)     # <- column names tsv
ID853       gDNA        4.5 Mb incl. entire gene    Not yet available   Not yet available  # <- line tsv

OPHN1

OPHN1                                                                                      ### <- filename3
Small insertions : 3 mutations                                                             # <- header spaces
accession         insertion                            HGVS_(nucleotide)                   # <- column names tsv
ID96          TTATGTT(^183)TATtCAAATCCAGG c.549dupT    p.(Gln184Serfs*23)                  # <- line tsv
ID25          GTGCT(^310)AAGCAcaG_EI_GTCAGTTCT         c.931_932dupCA                      # <- line tsv

如何使用任何 linux 命令(如awkor )来实现所需的输出python

笔记:

提前致谢。

标签: pythonregexfileawksplit

解决方案


awk 'NF==1{filename=$0 ".txt"};{print > filename}' file.txt

一个等效但更具高尔夫球性的选择是

awk 'NF==1{f=$0".txt"}{print>f}' file.txt

推荐阅读