首页 > 解决方案 > 仅在设定频率下对选定索引进行随机替换

问题描述

我试图仅在选定列上以设定的频率替换数据。在这里AKX 的帮助下,我能够找到一种解决方案来随机替换整个列表中的值。问这个问题我感觉很糟糕,因为我已经问过一个类似的问题,但是,无论如何,我似乎都无法真正找到解决方案。所以我想要做的是,如果我有一个包含 4 个值的列表,我希望能够根据它们的索引选择哪些值被随机替换。例如,如果我选择索引 2 和 4,我只想替换这些索引中的值,但索引 1 和 3 保持不变。

vals = ["*"]
def replace_random(lst, min_n, max_n, replacements):
    n = random.randint(min_n, max_n)
    if n == 0: 
        return lst
    indexes = set(random.sample(range(len(lst)), n))
    return [
        random.choice(replacements)
        if index in indexes
        else value
        for index, value
        in enumerate(lst)
    ]

申请示例

with open("test2.txt", "w") as out, open("test.txt", "rt") as f:
    for line in f:
        li = line.strip()
        tabs = li.split("\t")
        geno = tabs[1:]
        new_geno = replace_random_indexes(geno, 0, 5, vals)
        print(new_geno)

举个例子,我一直在努力实现目标:

M = [1,3]
with open("test2.txt", "w") as out, open("test.txt", "rt") as f:
    for line in f:
        li = line.strip()
        tabs = li.split("\t")
        geno = tabs[1:]
        new_geno = replace_random_indexes(geno[M], 0, 1, vals)
        print(new_geno)

但是,当我尝试此操作时出现以下错误:

TypeError: list indices must be integers or slices, not list

示例数据: 输入:

123 1   2   1   4
234 -   2   0   4
345 -   2   -   4
456 0   2   1   4
567 1   2   1   4
678 0   2   0   4
789 -   2   1   4
890 0   2   1   4

输出:

123 1   *   1   4
234 -   2   0   4
345 -   2   -   *
456 0   2   1   *
567 1   2   1   4
678 0   2   0   4
789 -   2   1   4
890 0   *   1   4

编辑:

我忘了提一件事,我想只删除我不想编辑的索引,然后对我想要替换的索引执行替换,但是,我不确定如何将索引重新连接在一起以相同的顺序。这是我尝试过的

with open("test2.txt", "w") as out, open("start.test.txt", "rt") as f:
    for line in f:
        li = line.strip()
        tabs = li.split("\t")
        geno = tabs[1:]
        geno_alt = [i for j, i in enumerate(geno) if j not in M]
        geno_alt = replace_random(geno_alt,0,1,vals)
        print(geno_alt)

标签: python

解决方案


如果您要做的只是替换文件每一行上特定索引处的值(以您提供的示例数据为例),进行 n 次替换(从某个范围内随机选择 n 次),并从某些值中随机选择替换,这会起作用:

from random import sample, choice


def make_replacements(fn_in, fn_out, indices, values, frequency):
    with open(fn_out, "w") as out, open(fn_in, "r") as f:
        for line in f:
            indices_sample = sample(indices, choice(frequency))
            line = '\t'.join(
                choice(values)
                if n in indices_sample
                else v
                for n, v in enumerate(line.strip().split())
            ) + '\n'
            out.write(line)


make_replacements("start.test.txt", "out.txt", [2, 4], ['*'], [0, 1])

示例输出:

123 1   2   1   *
234 -   2   0   4
345 -   2   -   4
456 0   2   1   4
567 1   2   1   *
678 0   *   0   4
789 -   2   1   *
890 0   2   1   *

我已经根据您对问题和评论的更改更新了代码和示例输出,并相信这就是您所追求的。


推荐阅读