-
简介
PAML(Phylogenetic Analysis by Maximum Likelihood)是伦敦大学的杨子恒(Yang Ziheng)教 授开发的一套基于最大似然估计来对蛋白质和核酸序列进行系统发育分析的软件,对学术使用是 免费的。杨子恒教授维护和发布 PAML 在 UNIX/Linux/MAC OS X 平台下的 ANSI C 的源程序和MS Windows 下的可执行文件。
PAML 可实现系统发育树的构建,祖先序列估计,进化模拟和 KaKs 计算等功能。其中分支及 位点 KaKs 的计算是本软件包的特色功能。
由此可以下载并且安装
1 cd paml4.5/ 2 rm bin/*.exe 3 cd src 4 make -f Makefile 5 ls -l 6 rm *.o 7 mv baseml basemlg codeml pamp evolver yn00 chi2 ../bin
-
实例
1、codeml
所需文件有2
1⃣️:CDS的fasta比对文件(一定是3的倍数,且去掉终止密码子);
*一定是比对过后的文件,我是用clusw进行比对,进而输出fasta格式,有说PAML可以识别phlylip格式,但是我的这个不行,不知道咋么个情况。
得到的fasta文件后,可以使用EasyCodeML 进行格式转换,转为paml格式(把U替换为T)。把fasta文件放入inPath运行如下命令即可。
1 java -cp EasyCodeML.jar SeqFormatConvert.seqFactory.SeqConverter -i inPath/ -iF fasta -o outPath/ -oF PAML
2⃣️:tree文件
使用phylip得到树文件
将两个文件放入PAML bin文件下即可
第一步:
配置文件的配置
打开codeml.ctl,其内容如下:
1 seqfile = dm_cds.pml * sequence data filename ##比对文件 2 treefile = outtree * tree structure file name##树的文件 3 outfile = dm2.mlc * main result file name##输出文件 4 5 noisy = 9 * 0,1,2,3,9: how much rubbish on the screen 6 verbose = 1 * 0: concise; 1: detailed, 2: too much 7 runmode = 0 * 0: user tree; 1: semi-automatic; 2: automatic 8 * 3: StepwiseAddition; (4,5):PerturbationNNI; -2: pairwise##利用自己的树,选择0 9 10 seqtype = 2 * 1:codons; 2:AAs; 3:codons-->AAs 11 CodonFreq = 2 * 0:1/61 each, 1:F1X4, 2:F3X4, 3:codon table 12 13 * ndata = 10 14 clock = 0 * 0:no clock, 1:clock; 2:local clock; 3:CombinedAnalysis 15 aaDist = 0 * 0:equal, +:geometric; -:linear, 1-6:G1974,Miyata,c,p,v,a 16 aaRatefile = dat/jones.dat * only used for aa seqs with model=empirical(_F) 17 * dayhoff.dat, jones.dat, wag.dat, mtmam.dat, or your own 18 19 model = 0 20 * models for codons: 21 * 0:one, 1:b, 2:2 or more dN/dS ratios for branches 22 * models for AAs or codon-translated AAs: 23 * 0:poisson, 1:proportional, 2:Empirical, 3:Empirical+F 24 * 6:FromCodon, 7:AAClasses, 8:REVaa_0, 9:REVaa(nr=189)##选择0表示假设所有树具有相同seqfile = dm_cds.pml * sequence data filename 25 treefile = outtree * tree structure file name 26 outfile = dm2.mlc * main result file name 27 28 noisy = 9 * 0,1,2,3,9: how much rubbish on the screen 29 verbose = 1 * 0: concise; 1: detailed, 2: too much 30 runmode = 0 * 0: user tree; 1: semi-automatic; 2: automatic 31 * 3: StepwiseAddition; (4,5):PerturbationNNI; -2: pairwise 32 33 seqtype = 2 * 1:codons; 2:AAs; 3:codons-->AAs 34 CodonFreq = 2 * 0:1/61 each, 1:F1X4, 2:F3X4, 3:codon table 35 36 * ndata = 10 37 clock = 0 * 0:no clock, 1:clock; 2:local clock; 3:CombinedAnalysis 38 aaDist = 0 * 0:equal, +:geometric; -:linear, 1-6:G1974,Miyata,c,p,v,a 39 aaRatefile = dat/jones.dat * only used for aa seqs with model=empirical(_F) 40 * dayhoff.dat, jones.dat, wag.dat, mtmam.dat, or your own 41 42 model = 0 43 * models for codons: 44 * 0:one, 1:b, 2:2 or more dN/dS ratios for branches 45 * models for AAs or codon-translated AAs: 46 * 0:poisson, 1:proportional, 2:Empirical, 3:Empirical 47 * 6:FromCodon, 7:AAClasses, 8:REVaa_0, 9:REVaa(nr=189)##选择0表示假定树所有的相同的dN/dS,选择2表示一个树具有2个或者2个以上的不同dN/dS
其中,比对文件格式如下:
5 4609 RPP4 -----ATGGCTTCTTCTTCTTCTTCTCCTAGTAGCCGGAGATACGACGTTTTCCCAAGCTTCAGTGGGGTAGATGTTCGCAAAACGTTCCTCAGCCATCTAATCGAGGCGCTCGACCGCAGATCAATCAATACA-TTCATGGATCACGGCATCG---TGAGAAGCTGCATAATCGCCGATGCGCTTATAACGGCCATTAGAGAAGCGAGGATCTCAATAGTCATCTTCTCTGAGA-ACTATGCTTCTTCAACGTGG------TGCTTGAATGAATTGGTGGAGATCCACAAG-TGTTACAAGAAAGGGGAACAAATGGTGATTCCGGTTTTCTACGGCGTTGATCCTTCTC-------ATGTTAGAAAACAGATCGGTGGCTTTGGCGATGTCTTTAAAAAGACATGCGAGGACA------------AACCAGAGGATCAGAAACAAAGA--TGGGTTAAAGCTCTCACAGATATATCAAATTTAGCCGGGGAGGATCTTCGGAACGGGCCTACTGAAGCG---TTTATGGTTAAAAAGATAGCCAATGATGTTTCGAATAAACTTT--TTCCTCTGCCAAAGGGTTTTGGTGACTTCGTCGGAATTGAAGATCATATAAAGGCAATAAAATC-AATACTTTGCTTGGAATC--CAAGGAAGCTAGAATAATGGTCGGGATTTGGGGACAGTCAGGGATTGGTAAGAGTACCATAGGAAG----AGCTCTTTTCAGTCAACTCTCTAGCCAGTTCCACCATCGCGCTTTCATAACTTATAAAAGCACCAGTGGTAGTGACGTCTCTGGCATGAAGTTGAGTTGGGAAAAAGAACTT----CTCTCGGAAATCTTAGGTCAAAAGGACATAAAGATAGATCATTTTGGTGTGG-TGGAGCAAAGGTTAAAGCACAA--GAAAGTTCTTATCCTTCTTGATGATGTGGATAATCTAGAGTTT---CTTAAGACCTTGGTGGGAAAAGCTGAATGGT---TTGGTTCTGGAA-GCAGAATAATTGTGATCACTCA----AGATAAGCAACTTCTCAAGGCTCATGAGATTGACCTTGTATATGAGGTG-GAGCTGCCATCTCAAGGTCTTGCTCTTAAGATGATATCCCAATATGCTTTTGGGAAAGACTCTCCACCTGATGATTTTAAGGAACTAGCATTTGAAGTTGCCGAGCTTGTCGGTAGTCTTCCTTTGGGTCTCAGTGT-----CT---TGGGTTCATCTTTA-AAAGGAAGG----GACAAAGATGAGTGGGTGAAGATGATGCCTAGGCTTCGAAATGATTCAGATGATAAAATTGAGGAAACACTAAGAGTCGGCTACG-ATAGGTTAAATAAAAAAAA--TAGAG--AGTTATTTAAGTGCATTGCATGTTTTTTCAATGGTTTTAAAGTC--------AGTAACGTCAAAGAATTACTTGAAGATGATGTTGGGCTTACAATGTTGGCTGATAAGTCCCTCATACGTATTACACCGGATGGAGATATAGAGATGCACAATTTGC---TAGAGAAATTGGGTAGAGAAATTGATCGTGCAAAGTCCAAGGGTAATCCTGCAAAACGTCAATTTCTGACGAATTTTGAGGATATTCAAGAAGTAGTGACCGAGAAAACTGGGACAGAAACTGTTCTTGGAATACGTGTGCCACCCACGGTATTATTTTCGACAAGG-CCGTTATTAGTAATAAACGAAGAATCGTTCAAAGGCATGCG---TAATCTCCAATATCTAGAAATTGGTCATTGGTCAGAAATTGGTCTTTGGTCAGAAATTGGTCTTTGGTCAAAAATAGATCTACCT--CAGGGCCTCGTTTATTTGCCCCTTAAACTCAAATTGCTA-AAATGGAATTATTGTCCATTGAAGTC--TTTGCCATCT--ACTTTTAAGGCGGAATATCTAG-TTAACCTCATAATGA-A-GTATAGTAAGCTTGAGAA--ACTGT--GGGAAGGAACTCTGCCCCTTGGAAG------TCTCAAGAAGATGGA--TTTGGGGTGTTCCAACAATTTGAAA-------GAAATTCCAGA---TCTTTCTTTAGCCAT-AAACCTCGAGGAATTAAATCTTTCTAAATGCGA-ATCTT---TGGTGACACT-TCCTTCCTCGAT-TCAGAATGCCATTAAACTGAGGACGTTATATTGTTCGGGGGTGCTATTAAT-AGATTTAAAATCATTAGAAGGCATGTGTAATCTCGAAT---ATCTATCAGTTGATTGGT---CAAGTATGGAAGGCACTCAAGGCCTCA--TTTACTTGCCACGTAAACTCAAAAGGCTATGGT---GGGATTATTGTC-----------CAGTGAAGCGTTTGCCTTCTAATTT------------TAAGGCTGAGTATCTAGTTGAACTCAGAATGGAGAATAGTGACCTTGAGAAGCTGTGGGATGGAACTCAGCCACTTGGAAGCCTCAAGGAGATGTA-------------TCTGCATGGTTCCAAATAT--TTGAAAGAAAT------TCCAGATCTTTCTTTAGCCATAAACCTGGAGAGACTATAT-CTTTTTGGATGCGAATCTTTGGTGACACTTCCTTCCTCGATTCAGAATGCCACTAAATTGATCAATTTAGA-TATGAGAGATTGCAAAAAGCTAGAGAGTTTTCCAACCGATCTCAACTTGGAATCTCTCGAGTACCTCAATCTCACTGGATGCCCGAATTTGAGAAATTTCCCAGCAATCAAAATG--GGATGTTCATACTTTGAAATTCTGCAAGATAGAAATGAGATCGAGGTAGAAGATTGT-TTCTGGAACAAG-AATCTCCCTGCTGGACTAGATTATCTCGACT-----GCCTTATGAGGTGTATGCCTTGTGAATTTCGCCCAGA----ATATCTCACTTTTCTCGATGTGAGCGGCTGCAA--GCATG--AGAAGCTATGGGAAGGCATCCAGTCGCTTGGAAG----------TCTCAAGAGGATGGATCTGTCAGAATCTGA--AAACCTGACAGAAATTCCAGATCTTTCGAAGGCCACCAATCTGAAGCGTTTATATCTCAACGGGTGCAAAAGTTTGGTGAC-ACTTCCTTCTACAATTGGGAATCTTCATAGATTGGTGAGGTTGGAAATGAAAGAATGCACAGGGCTGGAGCTTCTTCCAACCGATGTCAACTTGTCATCTCT------------------------------------------------------------TATCA------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------TCCTCGATCTCAGTGGTTGCTCAAGTCTGAGAACTTTTCCTCTGATTTCAACTAGAATCGAATGTCTCTATCTAGAAAACACCGCCATTGAAGAAGTTCCCTGCTGCATTGAGGATTTAACGAGGCTCAGTGTACTACTGATGTATTGTTGCCAGAGGTTGAAAAACATCTCCCCAAACATTTTCAGACTGACAAGTCTAATGGTCGCCGACTTTACAGACTGTAGAGGTGTCATCAAGGCGTTGAGTGATGCAACTGTGGTAGCGACAATGGAAGATCATGTTTCTTGTGTACCATTATCTGAAAACATTGAATATACATGTGAACGTTTCTGGGATGAGTTGTATGAAAGAAAT--TCCAGATCTA--TCTTTAGCTATAAAGATGAGGATGGCGACGTATATTGGGTAAA------TTGGGACTTA----ATGATGATGCTG-ATGTTGATA------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ RPP5 -----ATGGCGGCTTCTTCTTCTT---CTGGCAGACGGAGATACGACGTTTTTCCAAGCTTCAGTGGGGTTGATGTTCGCAAGACGTTCCTCAGCCATCTAATCGAGGCGCTCGACGGCAAATCAATCAATACA-TTCATCGATCATGGAATCG---AGAGAAGCCGCACAATCGCCCCTGAGCTTATATCGGCGATTAGAGAAGCTAGGATCTCAAT
tree的格式如下:
(RPP5:0.05837,(RPP27:0.64437,(RPP8:0.51359,RPP13:0.22395):0.18814):0.42283,RPP4:0.04920);
第二部:
进行假设检验
0假设:一个树所有的分支具有相同的dN/dS
备选假设:自定义一支具有不同的dN/dS (在tree文件中进行标注,在特定位置标出$1 或者#1,具体可看b站视频)
分别以model为0,以及model为2进行运行PAML,根据两个文件的结果中Lnl 计算p值
譬如:model=0: lnr1
model=2: lnr2
abs(lnr1-lnr2)*2 = q 值,自由度df为np2 -np1, np 为参数的数量
利用PAML中的chi函数,即可得到p值
1 ./chi df q
p值<0.05 则拒绝原假设,接受备选假设,即你认为的支确实dN/dS 有差异
关注下方公众号可获得更多精彩
参考