python - 熊猫唯一值和重组数据框
问题描述
我有这种数据框,我会根据“a”和“b”值将其拆分为多个数据框。
a b c d
0 red green 1 2
1 brown red 4 5
2 black grey 0 0
3 red blue 6 1
4 green blue 0 3
5 black brown 2 8
6 red grey 4 6
我可以使用将每个代码用作过滤器的代码来做到这一点,colors = pd.unique(df[['a', 'b']].values.ravel('K'))
我可以执行我想要的操作:
df_list = []
for color in colors:
current_df = df[(df.a == color) | (df.b == color)].copy()
current_df["e"] = current_df.apply(
lambda x: (
current_df[(current_df.a == color)].loc[x.name + 1 :, "c"].sum()
+ current_df[(current_df.b == color)].loc[x.name + 1 :, "d"].sum()
)
/ (
current_df[(current_df.a == color)].loc[x.name + 1 :, "c"].size
+ current_df[(current_df.b == color)].loc[x.name + 1 :, "d"].size
),
axis=1,
)
df_list.append(current_df)
操作在于,在数据帧被“a”和“b”中的值过滤后,如果过滤的值在“a”中,sum()
则从下一行中“c”中的值,sum()
从下一行中“ d" 如果过滤后的值在 "b" 中。结果:
df_list
[ a b c d e
0 red green 1 2 5.0
1 brown red 4 5 5.0
3 red blue 6 1 4.0
6 red grey 4 6 NaN,
a b c d e
1 brown red 4 5 8.0
5 black brown 2 8 NaN,
a b c d e
2 black grey 0 0 2.0
5 black brown 2 8 NaN,
a b c d e
0 red green 1 2 0.0
4 green blue 0 3 NaN,
a b c d e
2 black grey 0 0 6.0
6 red grey 4 6 NaN,
a b c d e
3 red blue 6 1 3.0
4 green blue 0 3 NaN]
这可行,但问题是我何时要重新组装原始数据框。因为这些操作只在一个新列中,而我需要在两列中,所以似乎重复了这些行。
df = pd.concat(df_list)
a b c d e
0 red green 1 2 5.0
1 brown red 4 5 5.0
3 red blue 6 1 4.0
6 red grey 4 6 NaN
1 brown red 4 5 8.0
5 black brown 2 8 NaN
2 black grey 0 0 2.0
5 black brown 2 8 NaN
0 red green 1 2 0.0
4 green blue 0 3 NaN
2 black grey 0 0 6.0
6 red grey 4 6 NaN
3 red blue 6 1 3.0
4 green blue 0 3 NaN
预期结果:
a b c d e1 e2
0 red green 1 2 5.0 0.0
1 brown red 4 5 8.0 5.0
2 black grey 0 0 2.0 6.0
3 red blue 6 1 4.0 3.0
4 green blue 0 3 NaN NaN
5 black brown 2 8 NaN NaN
6 red grey 4 6 NaN NaN
我如何通过,如果操作是针对“a”中的过滤值将结果放入“e1”中,对于“b”中的过滤值将结果放入“e2”中
解决方案
您可以使用groupby
然后水平分解数据框:
from numpy import nan
import pandas as pd
d = {'index': {0: 0, 1: 1, 2: 3, 3: 6, 4: 1, 5: 5, 6: 2, 7: 5, 8: 0, 9: 4, 10: 2, 11: 6, 12: 3, 13: 4}, 'a': {0: 'red', 1: 'brown', 2: 'red', 3: 'red', 4: 'brown', 5: 'black', 6: 'black', 7: 'black', 8: 'red', 9: 'green', 10: 'black', 11: 'red', 12: 'red', 13: 'green'}, 'b': {0: 'green', 1: 'red', 2: 'blue', 3: 'grey', 4: 'red', 5: 'brown', 6: 'grey', 7: 'brown', 8: 'green', 9: 'blue', 10: 'grey', 11: 'grey', 12: 'blue', 13: 'blue'}, 'c': {0: 1, 1: 4, 2: 6, 3: 4, 4: 4, 5: 2, 6: 0, 7: 2, 8: 1, 9: 0, 10: 0, 11: 4, 12: 6, 13: 0}, 'd': {0: 2, 1: 5, 2: 1, 3: 6, 4: 5, 5: 8, 6: 0, 7: 8, 8: 2, 9: 3, 10: 0, 11: 6, 12: 1, 13: 3}, 'e': {0: 5.0, 1: 5.0, 2: 4.0, 3: nan, 4: 8.0, 5: nan, 6: 2.0, 7: nan, 8: 0.0, 9: nan, 10: 6.0, 11: nan, 12: 3.0, 13: nan}}
df = pd.DataFrame(d)
df1 = df.groupby(['a', 'b', 'c', 'd']).agg(list)
df1 = (pd.concat(
[df1[c].apply(pd.Series).add_prefix(c + "_")
for c in df1],
axis=1)
).reset_index()
输出:
a b c d e_0 e_1
0 black brown 2 8 NaN NaN
1 black grey 0 0 2.0 6.0
2 brown red 4 5 5.0 8.0
3 green blue 0 3 NaN NaN
4 red blue 6 1 4.0 3.0
5 red green 1 2 5.0 0.0
6 red grey 4 6 NaN NaN
推荐阅读
- python - 在 Windows 10 / Python 3.6 上安装 Sybase 包失败
- python - 执行 Postgres SQL 脚本、接受响应并评估的 Python 脚本
- python - 在 Python Gino 中获取池连接(异步)
- python - Cython 实现不比纯 python 快
- javascript - 更改后,日期选择器验证不起作用-BootStrap
- module - 编程中源代码和代码的不同含义是什么?
- python - Python Pandas to_csv,能不能用.replace() 抢先处理双引号转义问题
- java - 为什么我无法使用 StringTokenizer 访问第二个令牌?
- r - 如何覆盖@运算符以访问 R 中 S4 对象的插槽
- database - mongorestore时MongoDB如何处理Object ID重叠?