首页 > 解决方案 > Python嵌套字典覆盖多个值

问题描述

我有以下嵌套字典:

table_dict = {
    "ints": {
      "domain highlights": {
        "rows": 3000000,
        "nulls": 5
      },
      "range metrics": {
        "mean": 0,
        "maximum": 0,
        "minimum": 0
      },
      "focus values": {
        "1": 0,
        "sample_text": 0
      }
    },
    "strings": {
      "domain highlights": {
        "rows": 3000000,
        "nulls": 5
      },
      "range metrics": {
        "mean": 0,
        "maximum": 0,
        "minimum": 0
      },
      "focus values": {
        "1": 0,
        "sample_text": 0
      }
    },
    "floats": {
      "domain highlights": {
        "rows": 3000000,
        "nulls": 5
      },
      "range metrics": {
        "mean": 0,
        "maximum": 0,
        "minimum": 0
      },
      "focus values": {
        "1": 0,
        "sample_text": 0
      }
    }
  }

当我运行这一行时 table_dict["ints"]["range metrics"]["mean"] = 11 ,它会更改所有“平均值”值,而不仅仅是“ints”字典中的平均值。这是我的字典在那一行之后的样子

table_dict = {
    "ints": {
      "domain highlights": {
        "rows": 3000000,
        "nulls": 5
      },
      "range metrics": {
        "mean": 11,
        "maximum": 0,
        "minimum": 0
      },
      "focus values": {
        "1": 0,
        "sample_text": 0
      }
    },
    "strings": {
      "domain highlights": {
        "rows": 3000000,
        "nulls": 5
      },
      "range metrics": {
        "mean": 11,
        "maximum": 0,
        "minimum": 0
      },
      "focus values": {
        "1": 0,
        "sample_text": 0
      }
    },
    "floats": {
      "domain highlights": {
        "rows": 3000000,
        "nulls": 5
      },
      "range metrics": {
        "mean": 11,
        "maximum": 0,
        "minimum": 0
      },
      "focus values": {
        "1": 0,
        "sample_text": 0
      }
    }
  }

我如何只更改一个值而不是覆盖所有值。是否有单独的方法来更改我需要使用的字典的值?

对于问我如何第一次创建 table_dict 的人:

focus_values = [1, "sample_text"]
table_name = ""
setup = True
col_dict = {"domain highlights": {"rows": 0, "nulls": 0},
            "range metrics": {"mean": 0, "maximum": 0, "minimum": 0},
            "focus values": {}}
for i in focus_values:
    col_dict["focus values"][i] = 0

file = "large_sample_file.csv"
file_df = pd.read_csv(file, chunksize=10000)
for chunk in file_df:
    if setup:
        setup = False
        table_name = chunk.iloc[1, 0] # table name is in column 1
        table_dict = {}
        for i in chunk.columns[1:]:
            table_dict[i] = col_dict
    profile_table(chunk, table_dict)

标签: pythondictionary

解决方案


从我的评论来看,听起来您在构建 dict 时重新使用了单个引用。您需要为插入的每个值创建一个新的“范围指标”字典。这是一个陷阱的例子。

initial_data = {
    "a": 0,
    "b": 1,
}

data = {
    "c": initial_data,
    "d": initial_data,
}

print(data)
data["c"]["a"] = 2
print(data)
{'c': {'a': 0, 'b': 1}, 'd': {'a': 0, 'b': 1}}
{'c': {'a': 2, 'b': 1}, 'd': {'a': 2, 'b': 1}}

您从这段代码中注意到,键cd子键都a更新了。那是因为两者都c指向d同一个引用。IEid(data["c"]) == id(data["d"])

相反,您需要做的是为每个值创建一个新的字典。我建议为此创建一个辅助方法:

def initial_data(): 
    return {
        "a": 0,
        "b": 1,
    }

data = {
    "c": initial_data(),
    "d": initial_data(),
}

print(data)
data["c"]["a"] = 2
print(data)

现在您将获得预期的结果:

{'c': {'a': 0, 'b': 1}, 'd': {'a': 0, 'b': 1}}
{'c': {'a': 2, 'b': 1}, 'd': {'a': 0, 'b': 1}}

推荐阅读