awk - 根据每一行的一列中的最高数字过滤文件
问题描述
我有以下文件:
chr11_pilon3.g3568.t1 transcript:OIT01734 transcript:OIT01734 1.1e-107 389.8 1000 218 992 1 216 130 345 MDALTRHIQGDVPWCMLFADDIILIDETRAGVSERLEIWRQTLESKGFKISRSKTEYLECKFGDEPSGVGREVMLGSQAIAKRDSVRYLGSVIQGDGEIDGDVTHRIGAGWSKWRLASGVLCDKKIPHKLKGKFFRAMVRPAMFYEAECWPVKNSHIQRMKVAEMRMLRWMCGHTRLDKIKNEVIRQKVGVAPVDKKMGEARLRWFGHVRRRGPDA MDALTRHIQGDVPWCMLFADDIVLIDETRVGVNERLEVWRQTLESKGFKLSRSKTEYLECKFSAESSEVGRDVKLGSQVIAKRDSFRYLGSVIQGEGEIDGDVTHRIGAGWSKWRLASGVLCDKKVPQKLKGKFYRAVVRPAMLYGAECWPVKNSHVQRMKVAEMRMLRWMRGLTRLDRIRNEVIREKVGVALVDEKMREARLRWYGHVRRRRPDA MDALTRHIQGDVPWCMLFADDIILIDETRAGVSERLEIWRQTLESKGFKISRSKTEYLECKFGDEPSGVGREVMLGSQAIAKRDSVRYLGSVIQGDGEIDGDVTHRIGAGWSKWRLASGVLCDKKIPHKLKGKFFRAMVRPAMFYEAECWPVKNSHIQRMKVAEMRMLRWMCGHTRLDKIKNEVIRQKVGVAPVDKKMGEARLRWFGHVRRRGPDAR* MKVWERVVEARVREMTSISVNQFGFMPGRSTTEAIHLVRRLVEHFRDKKKDLHMVFIDLENAYDKVPREVLWRCLEAKSVPEAYIRVIKDMYDGAKTRVRTVGGDSDHFPVVMGLHQGSALSPLLFALVMDALTRHIQGDVPWCMLFADDIVLIDETRVGVNERLEVWRQTLESKGFKLSRSKTEYLECKFSAESSEVGRDVKLGSQVIAKRDSFRYLGSVIQGEGEIDGDVTHRIGAGWSKWRLASGVLCDKKVPQKLKGKFYRAVVRPAMLYGAECWPVKNSHVQRMKVAEMRMLRWMRGLTRLDRIRNEVIREKVGVALVDEKMREARLRWYGHVRRRRPDAPVRIYKSAILGHLNSHGSQNALAGPVEAEENRQKTKKEVMEEIIQKSKFFKAQKAKDREENDELTEQLDKDFTSLVESKALLSLTQPDKINALKALVNKNISVGNVKKDEVADVPRKASIGKEKPDTYEMLVSEMALDMRARPSDRTKTPEEIAQEEKERLELLEQEXXXXXXXXXXXXXXDGNASDDNSKLVKDPRTVSGDDLGDDLEEVPRTKLGWIGEILRRKENELESEDAASSGDSDDGEDEGXXXXXXXXXXXXXXXXXXXXDEEQGKTQTIKDWEQSDDDIIDTELEDDDEGFGDDAKKVVKIKDHKEENLSITVAAENKKKMQVFYGVLLQYFAVLANKKPLNSKLLNLLVKPLMEMSAVSPYFAAICARQRLQRTRAQFCEDLKNTGKSSWPSLKTIFLLRLWSMIFPCSDFRHCVMTPAILLMCEYLMRCTIISGRDIAIASFLCSLLLSVIKQSQKFCPEAIVFIQTLLMAALDRKQRSNSQLDNLMEIKELGPLLCIRSSKVEMDSLDFLTLMDLPEDSQYFHSDNYRTSMLVTVLETLQGFVNVYKELISFPEIFMLISKLLCKMAGENHIPDALREKIKDVSQLIDTKAQEHHMLRQPLKMRKKKPVPIRMLNPKFEENFVKGRDYDPDRERA 389.8 1000 216 85.6 185 31 200 0 0 92.6 0 22IV6AV2SN4IV11IL12GSDA1PS1GE3ED1MK4AV6VF9DE29IV1HQ6FY2MV5FL1EG10IV14CR1HL4KR1KR5QE5PL2KE2GR6FY6GR3 85.6 1.1e-107 99.1
gene.10002.1.1.p1 NisylKD957037g0001.1 NisylKD957037g0001.1 0.0e+00 1218.8 3152 668 780 5 667 122 780 KVIARCRPELAHIPSLEEAPVFHPSEEEFEDTLKYVGSILPHVKHYGICRIVPPSSWKPPSCIEEESTVYGVNTHIQRTSELQNLFFKKRLEGACTRTNNKQQKTLSRKSDFGLDIERKEFGCCNEHFEFENGPKLMLKYFKHYADHFKKQYFVKEDQITASEPSIQDIEGEYWRIIENPTEEIEVLQGTSAEIKATESGFPHERGVTIHRPQYVESGWNLNNTPKLQDSLLRFGSHESSSILLPRLSIGMCFSSNLWRIEEHHLYLLSYIHFGAPKIFYGVPGSYRCKFEEAVKKHLPQLSAHPCLFQNMAFQFSPSILTSEGIPVYRCVQNPKEFVLILPGAYHAHVDSGFNCSEAVNFAPFDWLPHGQNAVDLYSEQRRKTSISYDKLLFEAATERIRALAELPLLHKKFFDNLKWRAVCRSNEILTKALKSRFATEVRRRKYMCASLESRKMEDDFCATAKRECSICYYDLYLSAIGCTCSPQKYTCLLHAKQLCSCAWREKYLLIRYEIDELNIMVEALDGKVSAVHKWAKEKLGFPVSDFSKDASKDEMKVKSESGQSLDVEQDRKEASIPSVGPSARTNNLNRVTGSWVEADGLSHQPQPKGIVNDTVEVIFPKISQHATVGKNIMISSNTVLKKHLARESSSTKRTVIILSDDEN KVIARCRPELARIPSLEEAPVFHPNTLKYVASILPHVKHYGICRIVPPSSWKPPSRIEEPSTVYGVNTHIQRTSDLQNLFFKKRLEGACTRTNNKQQKTLSGKSDFGHDIERKEFGCCNEHFEFENGPKLMLKYFKHYADHFKKQYFVKEDQITASEPSIQDIEGEYWRIIENPTEEIEVLQGTSAEIKATESSFPHEGDVTSRRPPQYVESGWNLNNTPKLQDSLLRFGSRESSSILLPRLSIGMCFSSNLWRIEEHHLYLLSYIHFGAPKIFYGVPGSYRCKFEEAVKKHLPQLSAHPCLLQNIAFQFSPSVLTSEGIPVYRCVQNPKEFVLLLPGAYHAHADSGFNCSEAVNFAPFDWLPHGQNAVELYSEQGRKTSISYDKLLFEAATEGIRALPELPLLHKNFFDNLKWRAVYRSNEILTKALKSRVSTEVRRRTYLCASLESRKMEDDFCATTKRECPICYYDLYLSAIGCKCSPHKYTCLLHAKQLCPCAWSEKYLLIRYEIDELNIMVEALDGKVSAVHKWAKEKLGLPVSDVFKDASKDGMKVKSESGQSLDIEQDRKEEVSIPSVGPSARTNNVNRVSGSWVEADGSSHRPQSKGIINDKIEVLFPKISQHATVGKNIMTSSNTVLKKHLARESSSTKRSVIILSDDEN MFGFKVIARCRPELAHIPSLEEAPVFHPSEEEFEDTLKYVGSILPHVKHYGICRIVPPSSWKPPSCIEEESTVYGVNTHIQRTSELQNLFFKKRLEGACTRTNNKQQKTLSRKSDFGLDIERKEFGCCNEHFEFENGPKLMLKYFKHYADHFKKQYFVKEDQITASEPSIQDIEGEYWRIIENPTEEIEVLQGTSAEIKATESGFPHERGVTIHRPQYVESGWNLNNTPKLQDSLLRFGSHESSSILLPRLSIGMCFSSNLWRIEEHHLYLLSYIHFGAPKIFYGVPGSYRCKFEEAVKKHLPQLSAHPCLFQNMAFQFSPSILTSEGIPVYRCVQNPKEFVLILPGAYHAHVDSGFNCSEAVNFAPFDWLPHGQNAVDLYSEQRRKTSISYDKLLFEAATERIRALAELPLLHKKFFDNLKWRAVCRSNEILTKALKSRFATEVRRRKYMCASLESRKMEDDFCATAKRECSICYYDLYLSAIGCTCSPQKYTCLLHAKQLCSCAWREKYLLIRYEIDELNIMVEALDGKVSAVHKWAKEKLGFPVSDFSKDASKDEMKVKSESGQSLDVEQDRKEASIPSVGPSARTNNLNRVTGSWVEADGLSHQPQPKGIVNDTVEVIFPKISQHATVGKNIMISSNTVLKKHLARESSSTKRTVIILSDDEN* MGAKRTRSNSESDDGYKLSVPPGFESLMSFTLKKVKNSEEACNSVALGSGFAQGPSLVAATSTIISTGKLKSSVRHRPWILDDHVDHIEDDSEFEDDKSLSSSAFLPKGVIRGCSSCHNCQKVIARCRPELARIPSLEEAPVFHPNTLKYVASILPHVKHYGICRIVPPSSWKPPSRIEEPSTVYGVNTHIQRTSDLQNLFFKKRLEGACTRTNNKQQKTLSGKSDFGHDIERKEFGCCNEHFEFENGPKLMLKYFKHYADHFKKQYFVKEDQITASEPSIQDIEGEYWRIIENPTEEIEVLQGTSAEIKATESSFPHEGDVTSRRPPQYVESGWNLNNTPKLQDSLLRFGSRESSSILLPRLSIGMCFSSNLWRIEEHHLYLLSYIHFGAPKIFYGVPGSYRCKFEEAVKKHLPQLSAHPCLLQNIAFQFSPSVLTSEGIPVYRCVQNPKEFVLLLPGAYHAHADSGFNCSEAVNFAPFDWLPHGQNAVELYSEQGRKTSISYDKLLFEAATEGIRALPELPLLHKNFFDNLKWRAVYRSNEILTKALKSRVSTEVRRRTYLCASLESRKMEDDFCATTKRECPICYYDLYLSAIGCKCSPHKYTCLLHAKQLCPCAWSEKYLLIRYEIDELNIMVEALDGKVSAVHKWAKEKLGLPVSDVFKDASKDGMKVKSESGQSLDIEQDRKEEVSIPSVGPSARTNNVNRVSGSWVEADGSSHRPQSKGIINDKIEVLFPKISQHATVGKNIMTSSNTVLKKHLARESSSTKRSVIILSDDEN 1218.8 3152 665 91.0 605 52 621 3 8 93.4 0 11HR12SNE-E-E-F-E-D-5GA24CR3EP14ED26RG5LH85GS4RGGD2ISHR2-P24HR70FL2MI7IV20IL8VA25DE5RG17RG4AP7KN10CY13FVAS6KT1ML16AT4SP13TK3QH12SP3RS36FL4FVSF6EG12VI6-EAV13LV3TS8LS2QR2PS3VI2TKVI2IL15IT19TS9 91.0 0.0e+00 99.3
gene.10002.1.4.p1 NisylKD957037g0001.1 NisylKD957037g0001.1 0.0e+00 1216.8 3147 671 780 9 670 123 780 VIARCRPELAHIPSLEEAPVFHPSEEEFEDTLKYVGSILPHVKHYGICRIVPPSSWKPPSCIEEESTVYGVNTHIQRTSELQNLFFKKRLEGACTRTNNKQQKTLSRKSDFGLDIERKEFGCCNEHFEFENGPKLMLKYFKHYADHFKKQYFVKEDQITASEPSIQDIEGEYWRIIENPTEEIEVLQGTSAEIKATESGFPHERGVTIHRPQYVESGWNLNNTPKLQDSLLRFGSHESSSILLPRLSIGMCFSSNLWRIEEHHLYLLSYIHFGAPKIFYGVPGSYRCKFEEAVKKHLPQLSAHPCLFQNMAFQFSPSILTSEGIPVYRCVQNPKEFVLILPGAYHAHVDSGFNCSEAVNFAPFDWLPHGQNAVDLYSEQRRKTSISYDKLLFEAATERIRALAELPLLHKKFFDNLKWRAVCRSNEILTKALKSRFATEVRRRKYMCASLESRKMEDDFCATAKRECSICYYDLYLSAIGCTCSPQKYTCLLHAKQLCSCAWREKYLLIRYEIDELNIMVEALDGKVSAVHKWAKEKLGFPVSDFSKDASKDEMKVKSESGQSLDVEQDRKEASIPSVGPSARTNNLNRVTGSWVEADGLSHQPQPKGIVNDTVEVIFPKISQHATVGKNIMISSNTVLKKHLARESSSTKRTVIILSDDEN VIARCRPELARIPSLEEAPVFHPNTLKYVASILPHVKHYGICRIVPPSSWKPPSRIEEPSTVYGVNTHIQRTSDLQNLFFKKRLEGACTRTNNKQQKTLSGKSDFGHDIERKEFGCCNEHFEFENGPKLMLKYFKHYADHFKKQYFVKEDQITASEPSIQDIEGEYWRIIENPTEEIEVLQGTSAEIKATESSFPHEGDVTSRRPPQYVESGWNLNNTPKLQDSLLRFGSRESSSILLPRLSIGMCFSSNLWRIEEHHLYLLSYIHFGAPKIFYGVPGSYRCKFEEAVKKHLPQLSAHPCLLQNIAFQFSPSVLTSEGIPVYRCVQNPKEFVLLLPGAYHAHADSGFNCSEAVNFAPFDWLPHGQNAVELYSEQGRKTSISYDKLLFEAATEGIRALPELPLLHKNFFDNLKWRAVYRSNEILTKALKSRVSTEVRRRTYLCASLESRKMEDDFCATTKRECPICYYDLYLSAIGCKCSPHKYTCLLHAKQLCPCAWSEKYLLIRYEIDELNIMVEALDGKVSAVHKWAKEKLGLPVSDVFKDASKDGMKVKSESGQSLDIEQDRKEEVSIPSVGPSARTNNVNRVSGSWVEADGSSHRPQSKGIINDKIEVLFPKISQHATVGKNIMTSSNTVLKKHLARESSSTKRSVIILSDDEN MFGFKARIVIARCRPELAHIPSLEEAPVFHPSEEEFEDTLKYVGSILPHVKHYGICRIVPPSSWKPPSCIEEESTVYGVNTHIQRTSELQNLFFKKRLEGACTRTNNKQQKTLSRKSDFGLDIERKEFGCCNEHFEFENGPKLMLKYFKHYADHFKKQYFVKEDQITASEPSIQDIEGEYWRIIENPTEEIEVLQGTSAEIKATESGFPHERGVTIHRPQYVESGWNLNNTPKLQDSLLRFGSHESSSILLPRLSIGMCFSSNLWRIEEHHLYLLSYIHFGAPKIFYGVPGSYRCKFEEAVKKHLPQLSAHPCLFQNMAFQFSPSILTSEGIPVYRCVQNPKEFVLILPGAYHAHVDSGFNCSEAVNFAPFDWLPHGQNAVDLYSEQRRKTSISYDKLLFEAATERIRALAELPLLHKKFFDNLKWRAVCRSNEILTKALKSRFATEVRRRKYMCASLESRKMEDDFCATAKRECSICYYDLYLSAIGCTCSPQKYTCLLHAKQLCSCAWREKYLLIRYEIDELNIMVEALDGKVSAVHKWAKEKLGFPVSDFSKDASKDEMKVKSESGQSLDVEQDRKEASIPSVGPSARTNNLNRVTGSWVEADGLSHQPQPKGIVNDTVEVIFPKISQHATVGKNIMISSNTVLKKHLARESSSTKRTVIILSDDEN* MGAKRTRSNSESDDGYKLSVPPGFESLMSFTLKKVKNSEEACNSVALGSGFAQGPSLVAATSTIISTGKLKSSVRHRPWILDDHVDHIEDDSEFEDDKSLSSSAFLPKGVIRGCSSCHNCQKVIARCRPELARIPSLEEAPVFHPNTLKYVASILPHVKHYGICRIVPPSSWKPPSRIEEPSTVYGVNTHIQRTSDLQNLFFKKRLEGACTRTNNKQQKTLSGKSDFGHDIERKEFGCCNEHFEFENGPKLMLKYFKHYADHFKKQYFVKEDQITASEPSIQDIEGEYWRIIENPTEEIEVLQGTSAEIKATESSFPHEGDVTSRRPPQYVESGWNLNNTPKLQDSLLRFGSRESSSILLPRLSIGMCFSSNLWRIEEHHLYLLSYIHFGAPKIFYGVPGSYRCKFEEAVKKHLPQLSAHPCLLQNIAFQFSPSVLTSEGIPVYRCVQNPKEFVLLLPGAYHAHADSGFNCSEAVNFAPFDWLPHGQNAVELYSEQGRKTSISYDKLLFEAATEGIRALPELPLLHKNFFDNLKWRAVYRSNEILTKALKSRVSTEVRRRTYLCASLESRKMEDDFCATTKRECPICYYDLYLSAIGCKCSPHKYTCLLHAKQLCPCAWSEKYLLIRYEIDELNIMVEALDGKVSAVHKWAKEKLGLPVSDVFKDASKDGMKVKSESGQSLDIEQDRKEEVSIPSVGPSARTNNVNRVSGSWVEADGSSHRPQSKGIINDKIEVLFPKISQHATVGKNIMTSSNTVLKKHLARESSSTKRSVIILSDDEN 1216.8 3147 664 91.0 604 52 620 3 8 93.4 0 10HR12SNE-E-E-F-E-D-5GA24CR3EP14ED26RG5LH85GS4RGGD2ISHR2-P24HR70FL2MI7IV20IL8VA25DE5RG17RG4AP7KN10CY13FVAS6KT1ML16AT4SP13TK3QH12SP3RS36FL4FVSF6EG12VI6-EAV13LV3TS8LS2QR2PS3VI2TKVI2IL15IT19TS9 91.0 0.0e+00 98.7
gene.10002.1.5.p1 NisylKD957037g0001.1 NisylKD957037g0001.1 0.0e+00 1218.8 3152 668 780 5 667 122 780 KVIARCRPELAHIPSLEEAPVFHPSEEEFEDTLKYVGSILPHVKHYGICRIVPPSSWKPPSCIEEESTVYGVNTHIQRTSELQNLFFKKRLEGACTRTNNKQQKTLSRKSDFGLDIERKEFGCCNEHFEFENGPKLMLKYFKHYADHFKKQYFVKEDQITASEPSIQDIEGEYWRIIENPTEEIEVLQGTSAEIKATESGFPHERGVTIHRPQYVESGWNLNNTPKLQDSLLRFGSHESSSILLPRLSIGMCFSSNLWRIEEHHLYLLSYIHFGAPKIFYGVPGSYRCKFEEAVKKHLPQLSAHPCLFQNMAFQFSPSILTSEGIPVYRCVQNPKEFVLILPGAYHAHVDSGFNCSEAVNFAPFDWLPHGQNAVDLYSEQRRKTSISYDKLLFEAATERIRALAELPLLHKKFFDNLKWRAVCRSNEILTKALKSRFATEVRRRKYMCASLESRKMEDDFCATAKRECSICYYDLYLSAIGCTCSPQKYTCLLHAKQLCSCAWREKYLLIRYEIDELNIMVEALDGKVSAVHKWAKEKLGFPVSDFSKDASKDEMKVKSESGQSLDVEQDRKEASIPSVGPSARTNNLNRVTGSWVEADGLSHQPQPKGIVNDTVEVIFPKISQHATVGKNIMISSNTVLKKHLARESSSTKRTVIILSDDEN KVIARCRPELARIPSLEEAPVFHPNTLKYVASILPHVKHYGICRIVPPSSWKPPSRIEEPSTVYGVNTHIQRTSDLQNLFFKKRLEGACTRTNNKQQKTLSGKSDFGHDIERKEFGCCNEHFEFENGPKLMLKYFKHYADHFKKQYFVKEDQITASEPSIQDIEGEYWRIIENPTEEIEVLQGTSAEIKATESSFPHEGDVTSRRPPQYVESGWNLNNTPKLQDSLLRFGSRESSSILLPRLSIGMCFSSNLWRIEEHHLYLLSYIHFGAPKIFYGVPGSYRCKFEEAVKKHLPQLSAHPCLLQNIAFQFSPSVLTSEGIPVYRCVQNPKEFVLLLPGAYHAHADSGFNCSEAVNFAPFDWLPHGQNAVELYSEQGRKTSISYDKLLFEAATEGIRALPELPLLHKNFFDNLKWRAVYRSNEILTKALKSRVSTEVRRRTYLCASLESRKMEDDFCATTKRECPICYYDLYLSAIGCKCSPHKYTCLLHAKQLCPCAWSEKYLLIRYEIDELNIMVEALDGKVSAVHKWAKEKLGLPVSDVFKDASKDGMKVKSESGQSLDIEQDRKEEVSIPSVGPSARTNNVNRVSGSWVEADGSSHRPQSKGIINDKIEVLFPKISQHATVGKNIMTSSNTVLKKHLARESSSTKRSVIILSDDEN MFGFKVIARCRPELAHIPSLEEAPVFHPSEEEFEDTLKYVGSILPHVKHYGICRIVPPSSWKPPSCIEEESTVYGVNTHIQRTSELQNLFFKKRLEGACTRTNNKQQKTLSRKSDFGLDIERKEFGCCNEHFEFENGPKLMLKYFKHYADHFKKQYFVKEDQITASEPSIQDIEGEYWRIIENPTEEIEVLQGTSAEIKATESGFPHERGVTIHRPQYVESGWNLNNTPKLQDSLLRFGSHESSSILLPRLSIGMCFSSNLWRIEEHHLYLLSYIHFGAPKIFYGVPGSYRCKFEEAVKKHLPQLSAHPCLFQNMAFQFSPSILTSEGIPVYRCVQNPKEFVLILPGAYHAHVDSGFNCSEAVNFAPFDWLPHGQNAVDLYSEQRRKTSISYDKLLFEAATERIRALAELPLLHKKFFDNLKWRAVCRSNEILTKALKSRFATEVRRRKYMCASLESRKMEDDFCATAKRECSICYYDLYLSAIGCTCSPQKYTCLLHAKQLCSCAWREKYLLIRYEIDELNIMVEALDGKVSAVHKWAKEKLGFPVSDFSKDASKDEMKVKSESGQSLDVEQDRKEASIPSVGPSARTNNLNRVTGSWVEADGLSHQPQPKGIVNDTVEVIFPKISQHATVGKNIMISSNTVLKKHLARESSSTKRTVIILSDDEN* MGAKRTRSNSESDDGYKLSVPPGFESLMSFTLKKVKNSEEACNSVALGSGFAQGPSLVAATSTIISTGKLKSSVRHRPWILDDHVDHIEDDSEFEDDKSLSSSAFLPKGVIRGCSSCHNCQKVIARCRPELARIPSLEEAPVFHPNTLKYVASILPHVKHYGICRIVPPSSWKPPSRIEEPSTVYGVNTHIQRTSDLQNLFFKKRLEGACTRTNNKQQKTLSGKSDFGHDIERKEFGCCNEHFEFENGPKLMLKYFKHYADHFKKQYFVKEDQITASEPSIQDIEGEYWRIIENPTEEIEVLQGTSAEIKATESSFPHEGDVTSRRPPQYVESGWNLNNTPKLQDSLLRFGSRESSSILLPRLSIGMCFSSNLWRIEEHHLYLLSYIHFGAPKIFYGVPGSYRCKFEEAVKKHLPQLSAHPCLLQNIAFQFSPSVLTSEGIPVYRCVQNPKEFVLLLPGAYHAHADSGFNCSEAVNFAPFDWLPHGQNAVELYSEQGRKTSISYDKLLFEAATEGIRALPELPLLHKNFFDNLKWRAVYRSNEILTKALKSRVSTEVRRRTYLCASLESRKMEDDFCATTKRECPICYYDLYLSAIGCKCSPHKYTCLLHAKQLCPCAWSEKYLLIRYEIDELNIMVEALDGKVSAVHKWAKEKLGLPVSDVFKDASKDGMKVKSESGQSLDIEQDRKEEVSIPSVGPSARTNNVNRVSGSWVEADGSSHRPQSKGIINDKIEVLFPKISQHATVGKNIMTSSNTVLKKHLARESSSTKRSVIILSDDEN 1218.8 3152 665 91.0 605 52 621 3 8 93.4 0 11HR12SNE-E-E-F-E-D-5GA24CR3EP14ED26RG5LH85GS4RGGD2ISHR2-P24HR70FL2MI7IV20IL8VA25DE5RG17RG4AP7KN10CY13FVAS6KT1ML16AT4SP13TK3QH12SP3RS36FL4FVSF6EG12VI6-EAV13LV3TS8LS2QR2PS3VI2TKVI2IL15IT19TS9 91.0 0.0e+00 99.3
gene.10002.1.6.p1 NisylKD957037g0001.1 NisylKD957037g0001.1 0.0e+00 1440.2 3727 799 780 15 798 1 780 MGAKRTRSNGESDDGYKLSVPPGFESLMSFTLKKVKNSEEACNSVALESEFAQSPSQVAATSTIISIGKLKSSVRHRPWILDDHVDHIEDDSEFEDDKSLSSIAFLPKGVIRGCSSCHNCQKVIARCRPELAHIPSLEEAPVFHPSEEEFEDTLKYVGSILPHVKHYGICRIVPPSSWKPPSCIEEESTVYGVNTHIQRTSELQNLFFKKRLEGACTRTNNKQQKTLSRKSDFGLDIERKEFGCCNEHFEFENGPKLMLKYFKHYADHFKKQYFVKEDQITASEPSIQDIEGEYWRIIENPTEEIEVLQGTSAEIKATESGFPHERGVTIHRPQYVESGWNLNNTPKLQDSLLRFGSHESSSILLPRLSIGMCFSSNLWRIEEHHLYLLSYIHFGAPKIFYGVPGSYRCKFEEAVKKHLPQLSAHPCLFQNMAFQFSPSILTSEGIPVYRCVQNPKEFVLILPGAYHAHVDSGFNCSEAVNFAPFDWLPHGQNAVDLYSEQRRKTSISYDKLLFEAATERIRALAELPLLHKKFFDNLKWRAVCRSNEILTKALKSRFATEVRRRKYMCASLESRKMEDDFCATAKRECSICYYDLYLSAIGCTCSPQKYTCLLHAKQLCSCAWREKYLLIRYEIDELNIMVEALDGKVSAVHKWAKEKLGFPVSDFSKDASKDEMKVKSESGQSLDVEQDRKEASIPSVGPSARTNNLNRVTGSWVEADGLSHQPQPKGIVNDTVEVIFPKISQHATVGKNIMISSNTVLKKHLARESSSTKRTVIILSDDEN MGAKRTRSNSESDDGYKLSVPPGFESLMSFTLKKVKNSEEACNSVALGSGFAQGPSLVAATSTIISTGKLKSSVRHRPWILDDHVDHIEDDSEFEDDKSLSSSAFLPKGVIRGCSSCHNCQKVIARCRPELARIPSLEEAPVFHPNTLKYVASILPHVKHYGICRIVPPSSWKPPSRIEEPSTVYGVNTHIQRTSDLQNLFFKKRLEGACTRTNNKQQKTLSGKSDFGHDIERKEFGCCNEHFEFENGPKLMLKYFKHYADHFKKQYFVKEDQITASEPSIQDIEGEYWRIIENPTEEIEVLQGTSAEIKATESSFPHEGDVTSRRPPQYVESGWNLNNTPKLQDSLLRFGSRESSSILLPRLSIGMCFSSNLWRIEEHHLYLLSYIHFGAPKIFYGVPGSYRCKFEEAVKKHLPQLSAHPCLLQNIAFQFSPSVLTSEGIPVYRCVQNPKEFVLLLPGAYHAHADSGFNCSEAVNFAPFDWLPHGQNAVELYSEQGRKTSISYDKLLFEAATEGIRALPELPLLHKNFFDNLKWRAVYRSNEILTKALKSRVSTEVRRRTYLCASLESRKMEDDFCATTKRECPICYYDLYLSAIGCKCSPHKYTCLLHAKQLCPCAWSEKYLLIRYEIDELNIMVEALDGKVSAVHKWAKEKLGLPVSDVFKDASKDGMKVKSESGQSLDIEQDRKEEVSIPSVGPSARTNNVNRVSGSWVEADGSSHRPQSKGIINDKIEVLFPKISQHATVGKNIMTSSNTVLKKHLARESSSTKRSVIILSDDEN MSDCTWQRYKGEVLMGAKRTRSNGESDDGYKLSVPPGFESLMSFTLKKVKNSEEACNSVALESEFAQSPSQVAATSTIISIGKLKSSVRHRPWILDDHVDHIEDDSEFEDDKSLSSIAFLPKGVIRGCSSCHNCQKVIARCRPELAHIPSLEEAPVFHPSEEEFEDTLKYVGSILPHVKHYGICRIVPPSSWKPPSCIEEESTVYGVNTHIQRTSELQNLFFKKRLEGACTRTNNKQQKTLSRKSDFGLDIERKEFGCCNEHFEFENGPKLMLKYFKHYADHFKKQYFVKEDQITASEPSIQDIEGEYWRIIENPTEEIEVLQGTSAEIKATESGFPHERGVTIHRPQYVESGWNLNNTPKLQDSLLRFGSHESSSILLPRLSIGMCFSSNLWRIEEHHLYLLSYIHFGAPKIFYGVPGSYRCKFEEAVKKHLPQLSAHPCLFQNMAFQFSPSILTSEGIPVYRCVQNPKEFVLILPGAYHAHVDSGFNCSEAVNFAPFDWLPHGQNAVDLYSEQRRKTSISYDKLLFEAATERIRALAELPLLHKKFFDNLKWRAVCRSNEILTKALKSRFATEVRRRKYMCASLESRKMEDDFCATAKRECSICYYDLYLSAIGCTCSPQKYTCLLHAKQLCSCAWREKYLLIRYEIDELNIMVEALDGKVSAVHKWAKEKLGFPVSDFSKDASKDEMKVKSESGQSLDVEQDRKEASIPSVGPSARTNNLNRVTGSWVEADGLSHQPQPKGIVNDTVEVIFPKISQHATVGKNIMISSNTVLKKHLARESSSTKRTVIILSDDEN* MGAKRTRSNSESDDGYKLSVPPGFESLMSFTLKKVKNSEEACNSVALGSGFAQGPSLVAATSTIISTGKLKSSVRHRPWILDDHVDHIEDDSEFEDDKSLSSSAFLPKGVIRGCSSCHNCQKVIARCRPELARIPSLEEAPVFHPNTLKYVASILPHVKHYGICRIVPPSSWKPPSRIEEPSTVYGVNTHIQRTSDLQNLFFKKRLEGACTRTNNKQQKTLSGKSDFGHDIERKEFGCCNEHFEFENGPKLMLKYFKHYADHFKKQYFVKEDQITASEPSIQDIEGEYWRIIENPTEEIEVLQGTSAEIKATESSFPHEGDVTSRRPPQYVESGWNLNNTPKLQDSLLRFGSRESSSILLPRLSIGMCFSSNLWRIEEHHLYLLSYIHFGAPKIFYGVPGSYRCKFEEAVKKHLPQLSAHPCLLQNIAFQFSPSVLTSEGIPVYRCVQNPKEFVLLLPGAYHAHADSGFNCSEAVNFAPFDWLPHGQNAVELYSEQGRKTSISYDKLLFEAATEGIRALPELPLLHKNFFDNLKWRAVYRSNEILTKALKSRVSTEVRRRTYLCASLESRKMEDDFCATTKRECPICYYDLYLSAIGCKCSPHKYTCLLHAKQLCPCAWSEKYLLIRYEIDELNIMVEALDGKVSAVHKWAKEKLGLPVSDVFKDASKDGMKVKSESGQSLDIEQDRKEEVSIPSVGPSARTNNVNRVSGSWVEADGSSHRPQSKGIINDKIEVLFPKISQHATVGKNIMTSSNTVLKKHLARESSSTKRSVIILSDDEN 1440.2 3727 786 91.5 719 59 735 3 8 93.5 0 9GS37EG1EG3SG2QL9IT35IS29HR12SNE-E-E-F-E-D-5GA24CR3EP14ED26RG5LH85GS4RGGD2ISHR2-P24HR70FL2MI7IV20IL8VA25DE5RG17RG4AP7KN10CY13FVAS6KT1ML16AT4SP13TK3QH12SP3RS36FL4FVSF6EG12VI6-EAV13LV3TS8LS2QR2PS3VI2TKVI2IL15IT19TS9 91.5 0.0e+00 98.1
上面的文件有一些相似的ID
gene.10002.1.1.p1
gene.10002.1.4.p1
gene.10002.1.5.p1
gene.10002.1.6.p1
通过保留只有gene.10002
ID 变得相同。我使用这个 awk 脚本(感谢@anubhava)只保留具有最小值的相同 ID 的行(第 30 列)
awk '{
if (/^gene\./) {
split($1, a, /\./)
k = a[1] "." a[2]
}
else
k = $1
}
!(k in min) || $30 <= min[k] {
if(!(k in min))
ord[++n] = k
else if (min[k] == $30) {
print
next
}
min[k] = $30
rec[k] = $0
}
END {
for (i=1; i<=n; ++i)
print rec[ord[i]]
}' file
我未能修改上述 awk 脚本以考虑第 31 列中的最大值并在第 31 列值相同时保留多个副本?
awk '{
if (/^gene\./) {
split($1, a, /\./)
k = a[1] "." a[2]
}
else
k = $1
}
!(k in max) || $31 <= max[k] {
if(!(k in max))
ord[++n] = k
else if (max[k] == $31) {
print
next
}
cov[k] = $31
rec[k] = $0
}
END {
for (i=1; i<=n; ++i)
print rec[ord[i]]
}'
解决方案
在这里修复 OP 的尝试,请您尝试以下操作。你应该改变你的条件来比较>=
条件$31 >= max[k]
,因为我们现在正在寻找最大值,所以在这篇文章的后面部分也添加了详细的解释。
awk '{
if (/^gene\./) {
split($1, a, /\./)
k = a[1] "." a[2]
}
else
k = $1
}
!(k in max) || $31 >= max[k] {
if(!(k in max))
ord[++n] = k
else if (max[k] == $31) {
print
next
}
max[k] = $31
rec[k] = $0
}
END {
for (i=1; i<=n; ++i)
print rec[ord[i]]
}' Input_file
说明:为上述添加详细说明。
awk '{ ##Starting awk program from here.
if (/^gene\./) { ##Checking condition if line is NOT starting from gene. then do following.
split($1, a, /\./) ##Splitting first field into array a with delimiter dot here.
k = a[1] "." a[2] ##Creating variable k with value of a[1] DOT a[2] here.
}
else ##In case line NOT starting from gene. then do following.
k = $1 ##Setting 1st field value to k here.
}
!(k in max) || $31 >= max[k] { ##Checking condition if k is NOT in max array and 31st field is >= max[k]
if(!(k in max)) ##If above any of the condition is true then check if k is NOT present in max
ord[++n] = k ##Creating ord with index of increasing value of n and its value is k
else if (max[k] == $31) { ##else printing maximum duplicate line, no need to keep appending it in array.
print ##Printing it here.
next ##next will skip all further statements from here.
}
max[k] = $31 ##Creating max with index of k and value of 31st field.
rec[k] = $0 ##Creating rec with index of k and value of current line.
}
END { ##Starting END block of this program from here.
for (i=1; i<=n; ++i) ##Starting a for loop from i=1 to till value of n here.
print rec[ord[i]] ##Printing array rec with index of; value of ord array which has i index.
}' Input_file ##Mentioning Input_file name here.
推荐阅读
- c++ - 如何在硬币找零中添加记忆
- amazon-web-services - 按放置顺序处理放入 AWS lambda 中的 s3 存储桶中的文件
- laravel - php artisan migrate 命令错误:找不到驱动程序
- system-verilog - “virtual”关键字在 systemverilog 中是如何工作的?
- r - 用计算值替换缺失值
- php - 在 codeigniter 中使用组查询加入 2 个表
- android - React Native Flatlist 双向
- python - Python 系列 - 如果它在其他两个系列的值之间(时间)
- node.js - 将 req.user 传递给模板不适用于 ejs 和 express
- java - Java - 不要删除与任何数组值匹配的文件