python - Pandas DataFrame：在多列条件下对数据框进行编程行拆分

问题描述

语境

我正在处理一个 DataFrame df，其中有很多列都填充了数值

df
lorem ipsum  |  dolor sic  |  ...  |  (hundreds of cols)
---------------------------------------------------------
0.5          |     -6.2    |  ...  | 79.8
-26.1        |     6200.0  |  ...  | -65.2
150.0        |     3.14    |  ...  | 1.008

另一种意思是，我有一个list_cols列：

list_cols = ['lorem ipsum', 'dolor sic', ... ]  # arbitrary length, of course len(list_cols ) <= len(df.columns), and contains valid columns of my df

我想获得 2 个数据帧：

1 包含所有行，其中value < 0至少有一个list_cols（对应于 a OR）。让我们称之为negative_values_matches
1 对应于数据帧的剩余部分，让我们称之为positive_values_matches

预期结果示例

对于list_cols = ['lorem ipsum', 'dolor sic']，我将获得数据帧，list_cols 中至少有 1 个值严格为负：

negative_values_matches
lorem ipsum  |  dolor sic  |  ...  |  (hundreds of cols)
---------------------------------------------------------
0.5          |     -6.2    |  ...  | 79.8
-26.1        |     6200.0  |  ...  | -65.2


positive_values_matches
lorem ipsum  |  dolor sic  |  ...  |  (hundreds of cols)
---------------------------------------------------------
150.0        |     3.14    |  ...  | 1.008

我不想自己写这种代码：

negative_values_matches = df[ (criterion1 | criterion2 | ... | criterionn)]
positive_values_matches = df[~(criterion1 | criterion2 | ... | criterionn)]

（其中criterionk是列的布尔值评估，k例如：(df[col_k]>=0)，此处使用括号，因为它是 Pandas 语法）

我们的想法是采用程序化方法。我主要是在寻找一个布尔数组，所以我可以使用布尔索引（参见Pandas 文档）。

据我所知，这些帖子并不是我所说的：

在 Pandas 中根据多个条件过滤 DataFrame
在熊猫数据框中删除多个条件的行
Pandas：np.where 在数据帧上有多个条件
Pandas DataFrame：如何在多个条件下选择行？这个有点接近我正在寻找的东西。但是，它依赖于生成一个可能不适用于“异国情调”列名（空格）的字符串（或者至少我不知道该怎么做）

我不知道如何将我的 DataFrame 上的布尔评估与OR运算符和 bd 完全链接起来，以获得正确的行拆分。

我能做些什么？

标签： pythonpandasdataframe

经过几次尝试，我成功地实现了我的目标。

这是代码：

import Pandas
import numpy
# assume dataframe exists
df = ...
# initiliaze an array of False, matching df number of rows
resulting_bools = numpy.zeros((1, len(df.index)), dtype=bool)

for col in list_cols:
    # obtain array of booleans for given column and boolean condition for [row, column] value
    criterion = df[col].map(lambda x: x < 0) # same condition for each column, different conditions would have been more difficult (for me)

     # perform cumulative boolean evaluation accross columns
    resulting_bools |= criterion

# use the array of booleans to build the required df
negative_values_matches = df[ resulting_bools].copy() # use .copy() to avoid further possible warnings from Pandas depending on what you do with your data frame
positive_values_matches = df[~resulting_bools].copy()

这样，我成功获得了2个数据框：

1 中的至少 1 列的值 < 0 的所有行list_cols
1 与所有其他行（值 >= 0 中的每个列list_col）

（False 上的数组初始化取决于布尔评估选择）

注意：该方法可以与dataframes 上的多个条件结合使用。待确认。

python - Pandas DataFrame：在多列条件下对数据框进行编程行拆分

问题描述

语境

预期结果示例

解决方案

推荐阅读