首页 > 解决方案 > 通过布尔分隔符拆分 numpy 数组/熊猫数据帧

问题描述

假设一个 numpy 数组(实际上是 Pandas)的形式:

[value, included,
 0.123, False,
 0.127, True,
 0.140, True,
 0.111, False,
 0.159, True,
 0.321, True,
 0.444, True,
 0.323, True,
 0.432, False]

我想拆分数组,以便False排除元素并将连续运行的True元素拆分为它们自己的数组。因此,对于上述情况,我们最终会得到:

[[0.127, True,
  0.140, True],
 [0.159, True,
  0.321, True,
  0.444, True,
  0.323, True]]

我当然可以通过将单个元素推送到列表中来做到这一点,但肯定有一种更 numpy-ish 的方式来做到这一点。

标签: pandasnumpy

解决方案


You can create groups by inverse mask by ~ with Series.cumsum and filter only Trues by boolean indexing, then create list of DataFrames by DataFrame.groupby:

dfs = [v for k, v in df.groupby((~df['included']).cumsum()[df['included']])]
print (dfs)
[   value  included
1  0.127      True
2  0.140      True,    value  included
4  0.159      True
5  0.321      True
6  0.444      True
7  0.323      True]

Also is possible convert Dataframes to arrays by DataFrame.to_numpy:

dfs = [v.to_numpy() for k, v in df.groupby((~df['included']).cumsum()[df['included']])]
print (dfs)
[array([[0.127, True],
       [0.14, True]], dtype=object), array([[0.159, True],
       [0.321, True],
       [0.444, True],
       [0.32299999999999995, True]], dtype=object)]

推荐阅读