首页 > 解决方案 > 将列表中的字母字符串转换为数字

问题描述

我已将一个文件读入一个如下所示的列表:

['>chr1_sliding:1-1000\n', 'TCATGGCTATTTTCATAAAAAATGGGGGTTGTGTGGCCATTTATCATCGACTAGAGGCTCATAAACCTCACCCCACATATGTTTCCTTGCCATAGATTACATTCTTGGATTTCTGGTGGAAACCAT\n', '\n', '>chr1_sliding:901-1900\n', 'TCATGGCTATTTTCATAAAAAATGGGGGTTGTGTGGCCATTTAT....]

我想根据这本字典将字母转换为数字:

dict = {"A": 0, "T": 1,"G": 2, "C": 3}

我已经这样做了:

with open("/Users/Downloads/test") as file_in:
    lines = []
    for line in file_in:
        lines.append(line)

for line in lines:
    try:
        print(dict[line])
    except KeyError:
        print("header")

但是我每行都打印“标题”:

输出

header
header 
header
header

预期输出:

header
13012...
header
13012...

标签: python

解决方案


首先定义一个转换函数,该函数将根据规则转换给定的行:

def transformData(line):
    transform_dict = {"A": 0, "T": 1, "G": 2, "C": 3}

    for char, val in transform_dict.items():
        line = line.replace(char, str(val))

    return line

然后继续遍历每一行并检查它是否是要转换的有效行。如果它是有效行,则将其传递给转换函数并存储结果。

data = ['>chr1_sliding:1-1000\n', 'TCATGGCTATTTTCATAAAAAATGGGGGTTGTGTGGCCATTTATCATCGACTAGAGGCTCATAAACCTCACCCCACATATGTTTCCTTGCCATAGATTACATTCTTGGATTTCTGGTGGAAACCAT\n', '\n', '>chr1_sliding:901-1900\n', 'TCATGGCTATTTTCATAAAAAATGGGGGTTGTGTGGCCATTTAT....\n']

headers = []    # For storing the final transformed data

for line in data:
    if not line.startswith('>') and line.strip():    # Check if a given line is valid
        headers.append(transformData(line))          # Transform the line and store it

最后以您打算的方式打印出结果:

for line in headers:
    print('header', line, sep='\n')

输出


header
13012...
header
13012...

推荐阅读