首页 > 解决方案 > 根据许多条件创建列

问题描述

我有一个看起来像这样的数据框:

Id   Col1    Col2     Col3   Col4
1     1        1       1      0
2     1        0       1      0
3     0        1       1      0
4     1        1       0      1
5     1        1       1      nan
...

col4 是唯一具有 nan 值的

所以,我有多个条件来创建多个列,例如:

New column A1
IF Col1 = 1 AND Col2 = 1 AND Col3 = 1
   AND (Col4 is null OR col4 = 0)
THEN 1
ELSE 0
New column A2
IF Col1 = 1 AND Col2 = 1 AND Col3 = 0
   AND (Col4 > 0)
THEN 1
ELSE 0
New column A3
IF Col1 = 1 AND Col2 = 1 AND Col3 = 0
   AND (Col4 is null OR col4 = 0)
THEN 1
ELSE 0
New column A4
IF Col1 = 1 AND Col2 = 0 AND Col3 = 1
   AND (Col4 > 0)
THEN 1
ELSE 0
New column A5
IF Col1 = 1 AND Col2 =01 AND Col3 = 1
   AND (Col4 > 0)
THEN 1
ELSE 0

我正在使用下面的代码来创建新列

A1_Conditions = [
    (df['col1'] ==1) & (df['col2'] == 1) & (df['col3'] == 1)
    &((df['col4'] ==0) | (df['col4'].isnull()))
]
First_Count['A1'] = np.select(A1_Conditions, [1], 0)

它可以工作,但是要创建很多列,并且阅读起来变得越来越复杂

还有其他方法吗?

标签: pythonpandasnumpy

解决方案


一种具有重用条件的想法,如果想要分配0,1,您可以将掩码转换为整数或使用numpy.whereor Series.view

mcol1_1 = df['col1'] == 1
mcol2_1 = df['col2'] == 1
mcol3_1 = df['col3'] == 1
mcol3_0 = df['col3'] == 0
mcol4_0 = df['col4'] == 0
mcol4_na = df['col4'].isnull()
mcol4_more0 = df['col4'] > 0

df['A1'] = (mcol1_1 & mcol2_1 & mcol3_1 & (mcol4_0 | mcol4_na)).astype(int)
df['A2'] = (mcol1_1 & mcol2_1 & mcol3_0 & mcol4_more0).astype(int)
...
...
df['A1'] = np.where(mcol1_1 & mcol2_1 & mcol3_1 & (mcol4_0 | mcol4_na), 1, 0)
df['A2'] = np.where(mcol1_1 & mcol2_1 & mcol3_0 & mcol4_more0, 1, 0)
...
...
df['A1'] = (mcol1_1 & mcol2_1 & mcol3_1 & (mcol4_0 | mcol4_na)).view('i1')
df['A2'] = (mcol1_1 & mcol2_1 & mcol3_0 & mcol4_more0).view('i1')
...
...

推荐阅读