首页 > 解决方案 > pandas dataframe: failed to apply lambda function to create new column based on condition if NaN or NA or \n or \t etc then 'No' else 'Yes'

问题描述

I have a pandas dataframe:

data = pd.DataFrame({'myCol': ['NaN','NA','xsysdf dfsf','ertrret ertret','\n','sdfdsfsdfsf','erw3242werw']
})

What I want to do is to:

  1. create a column myCol1 based on the condition that if myCol contains 'NA' or 'NaN' values or 'is Empty' (may be because of \n or \t like things), in myCol1 the value will appear as No otherwise Yes

  2. such that, my new dataframe should look like as below:

New DataFrame:

 myCol          myCol1
 NaN            No
 NA             No
 xsysdf dfsf    Yes
 ertrret ertret Yes
 \n             No
 sdfdsfsdfsf    Yes
 erw3242werw    Yes

And What I am trying to do is as below:

data['myCol1'] = data['myCol'].apply(lambda x: 'No' if(str(x) == 'nan') else 'Yes')

data['myCol1'] = data['myCol'].apply(lambda x: 'No' if np.isnan else 'Yes')

data['myCol1'] = data['myCol'].apply(lambda x: 'No' if(np.all(pd.notnull(x))) else 'Yes')

But each one of the above code send me the result as all the rows = 'No'

 data.groupby('myCol2').size()
 myCol2
 No    223567
 dtype: int64

标签: pythonpython-3.xpandas

解决方案


这将起作用:

import numpy as np
exclusions = ['nan', 'na', '\n', '\t']
data['myCol1'] = data['myCol'].apply(lambda x: any([x.lower() == exclusion for exclusion in exclusions]))
data['myCol1'] = np.where(data['myCol1'], 'No', 'Yes')

推荐阅读