首页 > 解决方案 > 绘制带有分数的项目层次结构

问题描述

我是 Python 和一般编程的新手,所以会很感激任何帮助。我正在尝试使用 Python(最好使用 Pandas)来执行以下操作:

数据

我有一个看起来像这样的表:

+--------------------+-------+
|    Parent:Child    | Score |
+--------------------+-------+
| Life:Work          |     3 |
| Work:Money         |     2 |
| Work:Hours         |     3 |
| Work:Hours         |     2 |
| Life:Health        |     2 |
| Money:Life savings |     3 |
+--------------------+-------+

期望的输出

  1. 表:确定唯一项目并计算平均分数:

有多个条目的分数是平均的

+--------------+---------------+
|  Unique item | Average score |
+--------------+---------------+
| Life         | NaN           |
| Work         | 3             |
| Health       | 2             |
| Money        | 2             |
| Hours        | 2.5           |
| Life savings | 3             |
+--------------+---------------+
  1. 树:

a) 确定项目的层次结构:

生活 > 工作 > 金钱 > 生活储蓄

生活 > 工作 > 时间

生活 > 健康

b) 绘制带有项目和平均分数的树:

                 Life (NaN)
              /              \
      Work (3)               Health (2)
       /       \  
 Money (2)  Hours (2.5)
      | 
Life savings (3)

一些注意事项:

在数据中,冒号(“:”)表示项目之间的关系。格式为父:子

"Life" 没有分数,所以它应该返回 NaN

“小时”在数据中有两个条目,因此显示平均值“(2+3)/2 = 2.5

非常感谢您的帮助!

已编辑 感谢 AKX 的有益回复。只有一个部分没有解决,所以我在这里澄清一下。对于2) 树: a) 确定项目的层次结构:

原始数据未指定 Parent:Child 所在的层。这里的问题是编写代码来解决这个问题并将它们链接起来。从“Life:Work”和“Work:Money”中,我们需要找出第一个条目(“Work”)的子项与第二个条目(“Money”)的父项匹配。IE:

来自:
生活:工作
工作:金钱

组合成:
生活:工作:金钱

最终,从原始数据:

+--------------------+-------+
|    Parent:Child    | Score |
+--------------------+-------+
| Life:Work          |     3 |
| Work:Money         |     2 |
| Work:Hours         |     3 |
| Work:Hours         |     2 |
| Life:Health        |     2 |
| Money:Life savings |     3 |
+--------------------+-------+

像这样创建一个表:


+--------+--------+--------+--------------+-----------+------------------------------------------------------------------------------------------------------------------------------------+
| Layer1 | Layer2 | Layer3 |    Layer4    | Avg Score |                                                             #Comments                                                              |
+--------+--------+--------+--------------+-----------+------------------------------------------------------------------------------------------------------------------------------------+
| Life   | Work   |        |              | 3         | #Directly from "Life:Work" in raw data                                                                                             |
| Life   | Work   | Money  |              | 2         | #Entry Work:Money has score 2. Since there is an entry "Life:Work", we know "Work" isn't an ultimate parent, and sits below "Life" |
| Life   | Work   | Money  | Life savings | 3         | #Entry "Money:Life savings" has score 3. Similarly, we know from other entries that the hierarchy is Life > Work > Money           |
| Life   | Work   | Hours  |              | 2.5       | #There're entries "Work:Money" and another "Work:Hours", so we know both "Money" and "Hours" are direct children of "Work"         |
| Life   | Health |        |              | 2         | #Directly from "Life:Health" which has score 2. And there is no entry above "Life", which makes it the top of the hierarchy        |
| Life   |        |        |              | NaN       | #There is no entry where "Life" is a child, so "Life" is an ultimate parent. Also, no entry tells us the score for "Life"          |
+--------+--------+--------+--------------+-----------+------------------------------------------------------------------------------------------------------------------------------------+


然后从这个表中,我们应该能够创建树(格式无关紧要)。

                 Life (NaN)
              /              \
      Work (3)               Health (2)
       /       \  
 Money (2)  Hours (2.5)
      | 
Life savings (3)

再次感谢任何帮助!

标签: pythonpandashierarchy

解决方案


这是使用asciitree我提到的那个库的事情。

事实证明,为每个树节点打印出自定义值变得相当容易,这正是我们在这里想要的。

我试图在可能的地方添加有用的评论。

from asciitree import LeftAligned, DictTraversal

import pandas as pd
from collections import defaultdict


class ShowValueTraversal(DictTraversal):
    def __init__(self, values):
        self.values = values

    def get_text(self, node):
        key = node[0]
        if key in self.values:
            return f"{key} ({self.values[key]})"
        return key


def treeify(averages_dict):
    # Make a recursive tree we can just add children to
    make_tree = lambda: defaultdict(make_tree)
    tree = make_tree()

    for tag, value in averages_dict.items():
        parent = tree
        parts = tag.split(":")
        for i in range(len(parts) + 1):
            joined_tag = ":".join(parts[:i])
            parent = parent[joined_tag]
    return tree


def fixup_names(dct):
    # Break down the keys on colons
    dct = {tuple(key.split(":")): value for (key, value) in dct.items()}
    # Get a mapping of the last "atoms" of each known name to their full name
    last_atom_map = {p[-1]: p for p in dct}

    # Walk through the original dictionary, replacing any known first atom with
    # an entry from the last atom map if possible and reconstitute the keys
    new_dct = {}
    for key, value in dct.items():
        key_parts = list(key)
        while key_parts[0] in last_atom_map:
            # Slice in the new prefix
            key_parts[0:1] = last_atom_map[key_parts[0]]
        new_key = ":".join(key_parts)
        new_dct[new_key] = value
    return new_dct


df = pd.DataFrame(
    [
        ("Life:Work", 3),
        ("Work:Money", 2),
        ("Work:Hours", 3),
        ("Work:Hours", 2),
        ("Life:Health", 2),
        ("Money:Life savings", 3),
        ("Money:Something", 2),
        ("Money:Something:Deeper", 1),
    ],
    columns=["tag", "value"],
)

print("# Original data")
print(df)
print()

print("# Averages")
df_averages = df.groupby("tag").mean()
print(df_averages)
print()

# Turn the averages into a dict of tag -> value
averages_dict = dict(df_averages.itertuples())

# Fix up the names (to infer hierarchy)
averages_dict = fixup_names(averages_dict)

# Generate a tree out of the flat data
tree = treeify(averages_dict)
# Instantiate a custom asciitree traversal object that knows how to
# look up the values from the dict
traverse = ShowValueTraversal(values=averages_dict)

# Print it out!
print("# Tree")
print(LeftAligned(traverse=traverse)(tree))

输出是

# Original data
                      tag  value
0               Life:Work      3
1              Work:Money      2
2              Work:Hours      3
3              Work:Hours      2
4             Life:Health      2
5      Money:Life savings      3
6         Money:Something      2
7  Money:Something:Deeper      1

# Averages
                        value
tag
Life:Health               2.0
Life:Work                 3.0
Money:Life savings        3.0
Money:Something           2.0
Money:Something:Deeper    1.0
Work:Hours                2.5
Work:Money                2.0

# Tree

 +-- Life
     +-- Life:Health (2.0)
     +-- Life:Work (3.0)
         +-- Life:Work:Money (2.0)
         |   +-- Life:Work:Money:Life savings (3.0)
         |   +-- Life:Work:Money:Something (2.0)
         |       +-- Life:Work:Money:Something:Deeper (1.0)

推荐阅读