首页 > 解决方案 > 熊猫唯一值和重组数据框

问题描述

我有这种数据框,我会根据“a”和“b”值将其拆分为多个数据框。

    a       b       c   d
0   red     green   1   2
1   brown   red     4   5
2   black   grey    0   0
3   red     blue    6   1
4   green   blue    0   3
5   black   brown   2   8
6   red     grey    4   6

我可以使用将每个代码用作过滤器的代码来做到这一点,colors = pd.unique(df[['a', 'b']].values.ravel('K')) 我可以执行我想要的操作:

df_list = []
for color in colors:
    current_df = df[(df.a == color) | (df.b == color)].copy()
    current_df["e"] = current_df.apply(
        lambda x: (
            current_df[(current_df.a == color)].loc[x.name + 1 :, "c"].sum()
            + current_df[(current_df.b == color)].loc[x.name + 1 :, "d"].sum()
        )
        / (
            current_df[(current_df.a == color)].loc[x.name + 1 :, "c"].size
            + current_df[(current_df.b == color)].loc[x.name + 1 :, "d"].size
        ),
        axis=1,
    )
    df_list.append(current_df)

操作在于,在数据帧被“a”和“b”中的值过滤后,如果过滤的值在“a”中,sum()则从下一行中“c”中的值,sum()从下一行中“ d" 如果过滤后的值在 "b" 中。结果:

df_list
[       a      b  c  d    e
 0    red  green  1  2  5.0
 1  brown    red  4  5  5.0
 3    red   blue  6  1  4.0
 6    red   grey  4  6  NaN,
        a      b  c  d    e
 1  brown    red  4  5  8.0
 5  black  brown  2  8  NaN,
        a      b  c  d    e
 2  black   grey  0  0  2.0
 5  black  brown  2  8  NaN,
        a      b  c  d    e
 0    red  green  1  2  0.0
 4  green   blue  0  3  NaN,
        a     b  c  d    e
 2  black  grey  0  0  6.0
 6    red  grey  4  6  NaN,
        a     b  c  d    e
 3    red  blue  6  1  3.0
 4  green  blue  0  3  NaN]

这可行,但问题是我何时要重新组装原始数据框。因为这些操作只在一个新列中,而我需要在两列中,所以似乎重复了这些行。

df = pd.concat(df_list)

    a     b     c   d   e
0   red   green 1   2   5.0
1   brown red   4   5   5.0
3   red   blue  6   1   4.0
6   red   grey  4   6   NaN
1   brown red   4   5   8.0
5   black brown 2   8   NaN
2   black grey  0   0   2.0
5   black brown 2   8   NaN
0   red   green 1   2   0.0
4   green blue  0   3   NaN
2   black grey  0   0   6.0
6   red   grey  4   6   NaN
3   red   blue  6   1   3.0
4   green blue  0   3   NaN

预期结果:

    a      b    c   d   e1  e2
0   red   green 1   2   5.0 0.0
1   brown red   4   5   8.0 5.0
2   black grey  0   0   2.0 6.0
3   red   blue  6   1   4.0 3.0
4   green blue  0   3   NaN NaN
5   black brown 2   8   NaN NaN
6   red   grey  4   6   NaN NaN

我如何通过,如果操作是针对“a”中的过滤值将结果放入“e1”中,对于“b”中的过滤值将结果放入“e2”中

标签: pythonpandasdataframe

解决方案


您可以使用groupby然后水平分解数据框:

from numpy import nan
import pandas as pd

d = {'index': {0: 0, 1: 1, 2: 3, 3: 6, 4: 1, 5: 5, 6: 2, 7: 5, 8: 0, 9: 4, 10: 2, 11: 6, 12: 3, 13: 4}, 'a': {0: 'red', 1: 'brown', 2: 'red', 3: 'red', 4: 'brown', 5: 'black', 6: 'black', 7: 'black', 8: 'red', 9: 'green', 10: 'black', 11: 'red', 12: 'red', 13: 'green'}, 'b': {0: 'green', 1: 'red', 2: 'blue', 3: 'grey', 4: 'red', 5: 'brown', 6: 'grey', 7: 'brown', 8: 'green', 9: 'blue', 10: 'grey', 11: 'grey', 12: 'blue', 13: 'blue'}, 'c': {0: 1, 1: 4, 2: 6, 3: 4, 4: 4, 5: 2, 6: 0, 7: 2, 8: 1, 9: 0, 10: 0, 11: 4, 12: 6, 13: 0}, 'd': {0: 2, 1: 5, 2: 1, 3: 6, 4: 5, 5: 8, 6: 0, 7: 8, 8: 2, 9: 3, 10: 0, 11: 6, 12: 1, 13: 3}, 'e': {0: 5.0, 1: 5.0, 2: 4.0, 3: nan, 4: 8.0, 5: nan, 6: 2.0, 7: nan, 8: 0.0, 9: nan, 10: 6.0, 11: nan, 12: 3.0, 13: nan}}

df = pd.DataFrame(d)

df1 = df.groupby(['a', 'b', 'c', 'd']).agg(list)
df1 = (pd.concat(
    [df1[c].apply(pd.Series).add_prefix(c + "_") 
     for c in df1], 
    axis=1)
 ).reset_index()

输出:

       a      b  c  d  e_0  e_1
0  black  brown  2  8  NaN  NaN
1  black   grey  0  0  2.0  6.0
2  brown    red  4  5  5.0  8.0
3  green   blue  0  3  NaN  NaN
4    red   blue  6  1  4.0  3.0
5    red  green  1  2  5.0  0.0
6    red   grey  4  6  NaN  NaN

推荐阅读