首页 > 解决方案 > 熊猫数据框上的递归循环

问题描述

输入:

| Company | Employee Number |
|---------|-----------------|
| 1       | 12              |
| 2       | 34, 12          |
| 3       | 56, 34, 78      |
| 4       | 90              |

目标:

查找所有公司中某个员工的所有员工编号

最终结果:

| Company | Employee Number |
|---------|-----------------|
| 1       | 12, 34, 56, 78  |
| 2       | 12, 34, 56, 78  |
| 3       | 12, 34, 56, 78  |
| 4       | 90              |

从上面的结果中可以看出,前三行是同一个员工。我们知道,因为第一个员工编号“12”存在于第二行,而员工编号“34”存在于第 2 行和第 3 行。所以,第 1、2 和 3 行都是同一员工。所以我们连接不同的员工编号并显示如上所示的结果。

注意:您可以拥有 0 或 N 个员工编号。

有没有递归的方法来做到这一点?如果没有,你能想到什么解决方案?

标签: pythonpandasdataframerecursion

解决方案


这是我将如何处理这个问题(评论中的解释):

# Replace NaN in df["Employee Number"] with empty string
df["Employee Number"] = df["Employee Number"].fillna("")

# Add a column with sets that contain the individual employee numbers
df["EN_Sets"] = df["Employee Number"].str.findall(r"\d+").apply(set)

# Build the maximal distinct employee number sets
en_sets = []
for en_set in df.EN_Sets:
    union_sets = []
    keep_sets = []
    for s in en_sets:
        if s.isdisjoint(en_set):
            keep_sets.append(s)
        else:
            union_sets.append(s)
    en_sets = keep_sets + [en_set.union(*union_sets)]

# Build a dictionary with the replacement strings as keys the distinct sets
# as values
en_sets = {", ".join(sorted(s)): s for s in en_sets}

# Apply-function to replace the original employee number strings
def setting_en_numbers(s):
    for en_set_str, en_set in en_sets.items():
        if not s.isdisjoint(en_set):
            return en_set_str

# Apply the function to df["Employee Number"]
df["Employee Number"] = df.EN_Sets.apply(setting_en_numbers)
df = df[["Company", "Employee Number"]]

结果为

df:
   Company Employee Number
0        1              12
1        2          34, 12
2        3      56, 34, 78
3        4              90
4        5             NaN

   Company Employee Number
0        1  12, 34, 56, 78
1        2  12, 34, 56, 78
2        3  12, 34, 56, 78
3        4              90
4        5                

推荐阅读