首页 > 解决方案 > 如何根据条件和索引细分 pandas DataFrame?

问题描述

我想根据状态值将这个熊猫数据帧细分为 N 个子数据帧。

        x      y  state
0   71.27  45.10      2
1   69.95  44.53      2
2   70.63  45.19      2
3   69.67  45.16      1
4   70.64  45.59      1
5   67.85  45.48      1
6   70.10  44.60      1
7   70.52  45.37      1
8   68.89  45.97      1
9   70.35  45.15      1
10  71.01  45.72      1
11  70.89  45.45      1
12  69.93  44.25      1
13  70.94  44.87      0
14  70.36  44.61      0
15  71.98  44.60      0
16  70.10  44.72      1
17  68.92  46.73      1
18  69.92  46.06      1
19  70.61  44.63      1
20  70.19  45.19      1
21  67.44  46.27      1

我可以轻松地将具有相同状态值的行分组:

df[df['state'] == 0]
        x      y  state
13  70.94  44.87      0
14  70.36  44.61      0
15  71.98  44.60      0
df[df['state'] == 1]
        x      y  state
3   69.67  45.16      1
4   70.64  45.59      1
5   67.85  45.48      1
6   70.10  44.60      1
7   70.52  45.37      1
8   68.89  45.97      1
9   70.35  45.15      1
10  71.01  45.72      1
11  70.89  45.45      1
12  69.93  44.25      1
16  70.10  44.72      1
17  68.92  46.73      1
18  69.92  46.06      1
19  70.61  44.63      1
20  70.19  45.19      1
21  67.44  46.27      1
df[df['state'] == 2]
       x      y  state
0  71.27  45.10      2
1  69.95  44.53      2
2  70.63  45.19      2

但是,我想根据索引再次拆分这些子数据帧。例如,在这里,我想要 2 个不同的子数据帧state==1而不是 1 个:

3   69.67  45.16      1
4   70.64  45.59      1
5   67.85  45.48      1
6   70.10  44.60      1
7   70.52  45.37      1
8   68.89  45.97      1
9   70.35  45.15      1
10  71.01  45.72      1
11  70.89  45.45      1
12  69.93  44.25      1
16  70.10  44.72      1
17  68.92  46.73      1
18  69.92  46.06      1
19  70.61  44.63      1
20  70.19  45.19      1
21  67.44  46.27      1

任何想法?

标签: pythonpandasdataframegroup-bysplit

解决方案


只要state. 然后你可以在块上分组。例如

# print blocks to see
blocks = df['state'].diff().ne(0).cumsum()
# if `state` is not numeric type
# blocks = df['state'].ne(df['state'].shift()]).cumsum()

[d for _,d in df.groupby(blocks)]

推荐阅读