首页 > 解决方案 > 将 pandas 数据框操作为所需的输出

问题描述

我尝试了不同的方法,但尚未找到解决方案。

问题是:如何将提示、方向、阈值、退出转换为用于 D3 可视化的分层 JSON 结构?有未知数量的级别,所以它必须是动态的。

我有一个五列八行的数据框,在我的例子中,每一行对应一棵树:

    tree       cues                        directions   thresholds   exits
     1   PLC2hrOGTT;Age;BMI;TimesPregnant   >;>;>;>   126;29;29.7;6  1;0;1;0.5
     2   PLC2hrOGTT;Age;BMI                 >;>;>     126;29;29.7    0;1;0.5
     3   PLC2hrOGTT;Age;BMI;TimesPregnant   >;>;>;>   126;29;29.7;6  1;0;0;0.5
     4   PLC2hrOGTT;Age;BMI;TimesPregnant   >;>;>;>   126;29;29.7;6  1;1;0;0.5
     5   PLC2hrOGTT;Age;BMI;TimesPregnant   >;>;>;>   126;29;29.7;6  0;1;0;0.5
     6   PLC2hrOGTT;Age;BMI                 >;>;>     126;29;29.7    0;0;0.5 
     7   PLC2hrOGTT;Age;BMI;TimesPregnant   >;>;>;>   126;29;29.7;6  1;1;1;0.5
     8   PLC2hrOGTT;Age;BMI;TimesPregnant   >;>;>;>   126;29;29.7;6  0;0;0;0.5

所需的输出快照(例如:第一行):

  "cues": "PLC2hrOGTT",
  "directions": ">",
  "thresholds": "126",
  "exits": "1",
  "children": [
    {
      "cues": "Age",
      "directions": ">",
      "thresholds": "29",
      "exits": "0",
      "children": [
        {
          "cues": "BMI",
          "directions": ">",
          "thresholds": "29.7",
          "exits": "1",
          "children": [
            {
              "cues": "TimesPregnant",
              "directions": ">",
              "thresholds": "6",
              "exits": "0.5",
              "children": [
                {
                  "cues": "True",
                },
                {
                  "cues": "False"
                }
              ]

对于树中的最后一个节点,总是 true 和 false 作为子节点给出,(因此退出列中的 0.5)

编辑所需的 if exits==1 then first 'True' and the 'cue' else the 'cue' and 'False'

{
    "cues": "PLC2hrOGTT",
    "directions": ">",
    "thresholds": "126",
    "exits": "1",
    "children": [
      {
        "cues": "True",
      },
      {
        "cues": "Age",
        "directions": ">",
        "thresholds": "29",
        "exits": "0",
        "children": [
          {
            "cues": "BMI",
            "directions": ">",
            "thresholds": "29.7",
            "exits": "1",
            "children": [
              {
                "cues": "True",
              },
              {
                "cues": "TimesPregnant",
                "directions": ">",
                "thresholds": "6",
                "exits": "0.5",
                "children":[
                  {
                    "cues": "True"
                  },
                  {
                    "cues": "False"
                  }
                ]
              }
            ]
          },
          {
            "cues": "False"
          }
        ]
      }
    ]
    }

标签: pythonjsonpandasdataframe

解决方案


给定 DataFrame 的一行(这是一个 Series,其索引是您的列名),此函数将提取一棵树,如您所示:

>>> def row_to_tree(row):
...     out = {}
...     pos = [out]
...     for cues, directions, thresholds, exits in zip(*map(lambda x: x.split(";"), row[["cues", "directions", "thresholds", "exits"]].values)):
...             pos = pos[0]
...             pos["cues"] = cues
...             pos["directions"] = directions
...             pos["thresholds"] = thresholds
...             pos["exits"] = exits
...             pos["children"] = [{"cues":True}]
...             pos = pos["children"]
...     pos.append({"cues": False})
...     return out

这通过获取行中的字符串序列row[["cues", "directions", "thresholds", "exits"]].values并在“;”处拆分每个字符串来工作。这是通过将函数映射lambda x: x.split(";")到每个字符串来完成的。这会产生一个列表,其中每个元素都是来自您的一个列的列表(例如,第一个列表是该行的提示列表)。然后,压缩这些列表有点像对二维列表进行转置。然后,我们遍历这些值,将它们添加到字典中,最后为子元素添加一个新字典。

然后,您只需将此函数应用于每一行即可为您提供树:

>>> trees = [row_to_tree(row) for i, row in df.iterrows()]
>>> print(json.dumps(trees[0], indent=2))
{
  "cues": "PLC2hrOGTT",
  "directions": ">",
  "thresholds": "126",
  "exits": "1",
  "children": [
    {
      "cues": "Age",
      "directions": ">",
      "thresholds": "29",
      "exits": "0",
      "children": [
        {
          "cues": "BMI",
          "directions": ">",
          "thresholds": "29.7",
          "exits": "1",
          "children": [
            {
              "cues": "TimesPregnant",
              "directions": ">",
              "thresholds": "6",
              "exits": "0.5",
              "children": [
                {
                  "cues": true
                },
                {
                  "cues": false
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}

推荐阅读