首页 > 解决方案 > 如何在 np.where 中使用表示条件的变量作为 pandas 中具有列表值的列?

问题描述

我正在尝试使用 np.where 根据其他条件在列内进行计算。我希望修改其他条件。我也必须使用 ** df1['matches'].fillna('[0]',inplace = True)** 否则它会给出不同的错误

代码:

df1 = pd.read_csv('one.txt',sep = '\t')
df1['matches'].fillna('[0]',inplace = True) 
df1['scorehigh?']  = df1['league'].apply(lambda a: 'yes' if a == 'Active' or a == 'Super Active' else 'no')
df1['greaterthan10?'] = (['yes' if any(int(a)>10 for a in i) else 'no' 
                                      for i in df1['matches'].str.findall('\d+')])

m=np.where((df1['scorehigh?']=='yes')) & (df1['matches'] != '[0]')                    

df1['Finals?']  = np.where((df1['scorehigh?']=='yes') & (df1['greaterthan10?'] == 'yes'), 'YES', m)
a=df1['Finals?'].value_counts()
print(a)

错误:

setting an array element with a sequence.

输入:

league          matches
Active          [[1, 0, 50,], [2, 0, 14,]]
Active          [[1, 0, 0,], [2, 0, 4,]]
Active          [[1, 0, 50,], [2, 0, 14,]]
Super Active    [[1, 0, 50,], [2, 0, 14,]]
Low             [[1, 0, 50,], [2, 0, 14,]]
Low             [[1, 0, 5,], [2, 0, 5,]]
Low             [[1, 0, 40,], [2, 0, 10,]]
Super Active    
Super Active    
Super Active    
Super   
Low 

预期输出:

league               matches                                   greater_than_10?
Active               [[1, 0, 50,], [2, 0, 14,]]                yes
Active               [[1, 0, 0,], [2, 0, 4,]]                  no
Active               [[1, 0, 50,], [2, 0, 14,]]                yes
Super Active         [[1, 0, 50,], [2, 0, 14,]]                yes
Low                  [[1, 0, 50,], [2, 0, 14,]]                no
Low                 [[1, 0, 5,], [2, 0, 5,]]                   no
Low                 [[1, 0, 40,], [2, 0, 10,]]                 no
Super Active           [0]                                     no
Super Active           [0]                                     no
Super Active           [0]                                     no
Super                  [0]                                     no
Low                    [0]                                     no

预期使用value.counts后:

Yes: 3
No: 4

标签: pythonpandasnumpydataframe

解决方案


问题在于:

m=np.where((df1['scorehigh?']=='yes')) & (df1['matches'] != '[0]')

如果掩码输出后没有参数是匹配值的位置数组。


df1['matches'].fillna('[0]',inplace = True) 


df1['scorehigh?']  = df1['league'].apply(lambda a: 'yes' if a == 'Active' or a == 'Super Active' else 'no')
df1['greaterthan10?'] = (['yes' if any(int(a)>10 for a in i) else 'no' 
                                      for i in df1['matches'].str.findall('\d+')])

如果不匹配,则使用嵌套numpy.where指定None,也仅使用第二个掩码df1['matches'] != '[0]'

df1['Finals?'] = np.where((df1['scorehigh?']=='yes')&(df1['greaterthan10?'] == 'yes'), 'YES',
                 np.where(df1['matches'] != '[0]', 'NO', None))

或者numpy.select

df1['Finals?'] = np.select([(df1['scorehigh?']=='yes')&  (df1['greaterthan10?'] == 'yes'), 
                            df1['matches'] != '[0]'], ['YES', 'NO'], default=None)

print (df1)
          league                     matches scorehigh? greaterthan10? Finals?
0         Active  [[1, 0, 50,], [2, 0, 14,]]        yes            yes     YES
1         Active    [[1, 0, 0,], [2, 0, 4,]]        yes             no      NO
2         Active  [[1, 0, 50,], [2, 0, 14,]]        yes            yes     YES
3   Super Active  [[1, 0, 50,], [2, 0, 14,]]        yes            yes     YES
4            Low  [[1, 0, 50,], [2, 0, 14,]]         no            yes      NO
5            Low    [[1, 0, 5,], [2, 0, 5,]]         no             no      NO
6            Low  [[1, 0, 40,], [2, 0, 10,]]         no            yes      NO
7   Super Active                         [0]        yes             no    None
8   Super Active                         [0]        yes             no    None
9   Super Active                         [0]        yes             no    None
10         Super                         [0]         no             no    None
11           Low                         [0]         no             no    None

a=df1['Finals?'].value_counts()
print(a)
NO     4
YES    3
Name: Finals?, dtype: int64

如果使用两个条件输出不同:

df1['Finals?'] = np.select([(df1['scorehigh?']=='yes')&  (df1['greaterthan10?'] == 'yes'), 
                            (df1['scorehigh?']=='yes') & (df1['matches'] != '[0]')], 
                            ['YES', 'NO'], default=None)
print (df1)
          league                     matches scorehigh? greaterthan10? Finals?
0         Active  [[1, 0, 50,], [2, 0, 14,]]        yes            yes     YES
1         Active    [[1, 0, 0,], [2, 0, 4,]]        yes             no      NO
2         Active  [[1, 0, 50,], [2, 0, 14,]]        yes            yes     YES
3   Super Active  [[1, 0, 50,], [2, 0, 14,]]        yes            yes     YES
4            Low  [[1, 0, 50,], [2, 0, 14,]]         no            yes    None
5            Low    [[1, 0, 5,], [2, 0, 5,]]         no             no    None
6            Low  [[1, 0, 40,], [2, 0, 10,]]         no            yes    None
7   Super Active                         [0]        yes             no    None
8   Super Active                         [0]        yes             no    None
9   Super Active                         [0]        yes             no    None
10         Super                         [0]         no             no    None
11           Low                         [0]         no             no    None

a=df1['Finals?'].value_counts()
print(a)
YES    3
NO     1
Name: Finals?, dtype: int64

推荐阅读