python - 将 pandas 数据框操作为所需的输出
问题描述
我尝试了不同的方法,但尚未找到解决方案。
问题是:如何将提示、方向、阈值、退出转换为用于 D3 可视化的分层 JSON 结构?有未知数量的级别,所以它必须是动态的。
我有一个五列八行的数据框,在我的例子中,每一行对应一棵树:
tree cues directions thresholds exits
1 PLC2hrOGTT;Age;BMI;TimesPregnant >;>;>;> 126;29;29.7;6 1;0;1;0.5
2 PLC2hrOGTT;Age;BMI >;>;> 126;29;29.7 0;1;0.5
3 PLC2hrOGTT;Age;BMI;TimesPregnant >;>;>;> 126;29;29.7;6 1;0;0;0.5
4 PLC2hrOGTT;Age;BMI;TimesPregnant >;>;>;> 126;29;29.7;6 1;1;0;0.5
5 PLC2hrOGTT;Age;BMI;TimesPregnant >;>;>;> 126;29;29.7;6 0;1;0;0.5
6 PLC2hrOGTT;Age;BMI >;>;> 126;29;29.7 0;0;0.5
7 PLC2hrOGTT;Age;BMI;TimesPregnant >;>;>;> 126;29;29.7;6 1;1;1;0.5
8 PLC2hrOGTT;Age;BMI;TimesPregnant >;>;>;> 126;29;29.7;6 0;0;0;0.5
所需的输出快照(例如:第一行):
"cues": "PLC2hrOGTT",
"directions": ">",
"thresholds": "126",
"exits": "1",
"children": [
{
"cues": "Age",
"directions": ">",
"thresholds": "29",
"exits": "0",
"children": [
{
"cues": "BMI",
"directions": ">",
"thresholds": "29.7",
"exits": "1",
"children": [
{
"cues": "TimesPregnant",
"directions": ">",
"thresholds": "6",
"exits": "0.5",
"children": [
{
"cues": "True",
},
{
"cues": "False"
}
]
对于树中的最后一个节点,总是 true 和 false 作为子节点给出,(因此退出列中的 0.5)
编辑所需的 if exits==1 then first 'True' and the 'cue' else the 'cue' and 'False'
{
"cues": "PLC2hrOGTT",
"directions": ">",
"thresholds": "126",
"exits": "1",
"children": [
{
"cues": "True",
},
{
"cues": "Age",
"directions": ">",
"thresholds": "29",
"exits": "0",
"children": [
{
"cues": "BMI",
"directions": ">",
"thresholds": "29.7",
"exits": "1",
"children": [
{
"cues": "True",
},
{
"cues": "TimesPregnant",
"directions": ">",
"thresholds": "6",
"exits": "0.5",
"children":[
{
"cues": "True"
},
{
"cues": "False"
}
]
}
]
},
{
"cues": "False"
}
]
}
]
}
解决方案
给定 DataFrame 的一行(这是一个 Series,其索引是您的列名),此函数将提取一棵树,如您所示:
>>> def row_to_tree(row):
... out = {}
... pos = [out]
... for cues, directions, thresholds, exits in zip(*map(lambda x: x.split(";"), row[["cues", "directions", "thresholds", "exits"]].values)):
... pos = pos[0]
... pos["cues"] = cues
... pos["directions"] = directions
... pos["thresholds"] = thresholds
... pos["exits"] = exits
... pos["children"] = [{"cues":True}]
... pos = pos["children"]
... pos.append({"cues": False})
... return out
这通过获取行中的字符串序列row[["cues", "directions", "thresholds", "exits"]].values
并在“;”处拆分每个字符串来工作。这是通过将函数映射lambda x: x.split(";")
到每个字符串来完成的。这会产生一个列表,其中每个元素都是来自您的一个列的列表(例如,第一个列表是该行的提示列表)。然后,压缩这些列表有点像对二维列表进行转置。然后,我们遍历这些值,将它们添加到字典中,最后为子元素添加一个新字典。
然后,您只需将此函数应用于每一行即可为您提供树:
>>> trees = [row_to_tree(row) for i, row in df.iterrows()]
>>> print(json.dumps(trees[0], indent=2))
{
"cues": "PLC2hrOGTT",
"directions": ">",
"thresholds": "126",
"exits": "1",
"children": [
{
"cues": "Age",
"directions": ">",
"thresholds": "29",
"exits": "0",
"children": [
{
"cues": "BMI",
"directions": ">",
"thresholds": "29.7",
"exits": "1",
"children": [
{
"cues": "TimesPregnant",
"directions": ">",
"thresholds": "6",
"exits": "0.5",
"children": [
{
"cues": true
},
{
"cues": false
}
]
}
]
}
]
}
]
}