python - Python 字典没有正确复制导致重复,如何解决这个问题?
问题描述
我正在编写一个函数,它应该比较列表(测试的重要基因)并列出所有可能的列表选择组合的共同元素(基因)。
这些结果将用于维恩图……
测试和基因的数量是灵活的。
输入 JSON 文件如下所示:
| test | genes |
|----------------- |--------------------------------------------------- |
| p-7trt_1/0con_1 | [ENSMUSG00000000031, ENSMUSG00000000049, ENSMU... |
| p-7trt_2/0con_1 | [ENSMUSG00000000031, ENSMUSG00000000037, ENSMU... |
| p-7trt_1/0con_2 | [ENSMUSG00000000037, ENSMUSG00000000049, ENSMU... |
| p-7trt_2/0con_2 | [ENSMUSG00000000028, ENSMUSG00000000031, ENSMU... |
| p-7trt_1/0con_3 | [ENSMUSG00000000088, ENSMUSG00000000094, ENSMU... |
| p-7trt_2/0con_3 | [ENSMUSG00000000028, ENSMUSG00000000031, ENSMU... |
所以函数如下:
import pandas as pd
def get_venn_compiled_data(dir_loc):
"""return json of compiled data for the venn thing
"""
data_frame = pd.read_json(dir_loc + "/venn.json", orient="records")
number_of_tests = data_frame.shape[0]
venn_data = []
venn_data_point = {"tests": [], "genes": []} # list of genes which are common across listed tests
binary = lambda x: bin(x)[2:] # to directly get the binary number
for dec_number in range(1, 2 ** number_of_tests):
# resetting
venn_data_point["tests"] = []
venn_data_point["genes"] = []
# using a binary number to get all the cases
for index, state in enumerate(binary(dec_number)):
if state == "0":
continue
# putting in all the genes from the first test
if venn_data_point["tests"] == []:
venn_data_point["genes"] = data_frame["data"][index].copy()
# removing the ones which are not common in current genes state and this.tests
else:
for gene_index, gene in enumerate(venn_data_point["genes"]):
if gene not in data_frame["data"][index]:
venn_data_point["genes"].pop(gene_index)
# putting the test in the tests list
venn_data_point["tests"].append(data_frame["name"][index])
venn_data.append(venn_data_point.copy())
return venn_data
我基本上是在滥用二进制数生成所有可能的 1 和 0 组合的事实,因此二进制数的每个位置都与测试相对应,对于每个二进制数,如果存在 0,则不采用与该测试对应的列表用于列表比较。
我尽力解释了,如果我不清楚,请在评论中提问。
运行该函数后,我得到一个输出,其中存在重复测试集的随机位置。
和
非常感谢任何帮助谢谢。
解决方案
我意识到我犯了什么错误
我假设二进制函数总是会神奇地生成具有我需要的位置数量的字符串,但事实并非如此。
在更新二进制函数以添加这些零之后,一切正常。
import pandas as pd
def get_venn_compiled_data(dir_loc):
"""return json of compiled data for the venn thing
"""
# internal variables
data_frame = pd.read_json(dir_loc + "/venn.json", orient="records")
number_of_tests = data_frame.shape[0]
venn_data = []
# defining internal function
def binary(dec_no, length=number_of_tests):
"""Just to convert decimal number to binary of specified length
"""
bin_number = bin(dec_no)[2:]
if len(bin_number) < length:
bin_number = "0" * (length - len(bin_number)) + bin_number
return bin_number
# list of genes which are common across listed tests
venn_data_point = {
"tests": [],
"genes": [],
}
for dec_number in range(1, 2 ** number_of_tests):
# resetting
venn_data_point["tests"] = []
venn_data_point["genes"] = []
# using a binary number to get all the cases
for index, state in enumerate(binary(dec_number)):
if state == "0":
continue
# putting in all the genes from the first test
if venn_data_point["tests"] == []:
venn_data_point["genes"] = data_frame["data"][index].copy()
# removing the ones which are not common in current genes state and this.tests
else:
for gene_index, gene in enumerate(venn_data_point["genes"]):
if gene not in data_frame["data"][index]:
venn_data_point["genes"].pop(gene_index)
# putting the test in the tests list
venn_data_point["tests"].append(data_frame["name"][index])
venn_data.append(venn_data_point.copy())
return venn_data
如果其他人对此有更优化的算法,我们将不胜感激。
推荐阅读
- testing - 黄瓜小黄瓜解析器java
- sql - 如何将数据插入到 postgres 中的多对多联结表中?
- java - 在 JAR 中包含本机库 - IntelliJ?
- css - Negative margin bottom to footer doesn't work
- c++ - constexpr 条件不恒定?
- python - 为什么我在上面定义的 Youtube API 上运行这个错误?我错过了进口吗?
- hyperledger-fabric - Hyperledger Fabric CA 工具未生成身份
- python - 在 Tkinter 中创建 3 个复选框
- put - 如何使用 angularjs 使用 id 发送 http PUT 请求
- javascript - 将返回按钮提供给引导步骤