python - pandas dataframe: failed to apply lambda function to create new column based on condition if NaN or NA or \n or \t etc then 'No' else 'Yes'
问题描述
I have a pandas dataframe:
data = pd.DataFrame({'myCol': ['NaN','NA','xsysdf dfsf','ertrret ertret','\n','sdfdsfsdfsf','erw3242werw']
})
What I want to do is to:
create a column
myCol1
based on the condition that ifmyCol
contains 'NA' or 'NaN' values or 'is Empty' (may be because of\n
or\t
like things), inmyCol1
the value will appear asNo
otherwiseYes
such that, my new dataframe should look like as below:
New DataFrame:
myCol myCol1
NaN No
NA No
xsysdf dfsf Yes
ertrret ertret Yes
\n No
sdfdsfsdfsf Yes
erw3242werw Yes
And What I am trying to do is as below:
data['myCol1'] = data['myCol'].apply(lambda x: 'No' if(str(x) == 'nan') else 'Yes')
data['myCol1'] = data['myCol'].apply(lambda x: 'No' if np.isnan else 'Yes')
data['myCol1'] = data['myCol'].apply(lambda x: 'No' if(np.all(pd.notnull(x))) else 'Yes')
But each one of the above code send me the result as all the rows = 'No'
data.groupby('myCol2').size()
myCol2
No 223567
dtype: int64
解决方案
这将起作用:
import numpy as np
exclusions = ['nan', 'na', '\n', '\t']
data['myCol1'] = data['myCol'].apply(lambda x: any([x.lower() == exclusion for exclusion in exclusions]))
data['myCol1'] = np.where(data['myCol1'], 'No', 'Yes')
推荐阅读
- javascript - 我如何将类型 datetime-local 转换为字符串
- c# - 如何序列化列表
- > 在 C#/Unity 中转换为 JSON
- javascript - Jquery通过过滤表实时搜索
- c - C - SIGINT 处理程序不能与多个线程一起使用,每个线程都有一个 popen 进程
- mysql - 如何使用 MySQL NOT EXISTS 运算符
- python - 来自 itertools 库的 tee() 函数
- gradle - 仅更改 gradle 构建的子任务之一的日志级别
- python - 如何在opencv中应用三点三角形渐变?
- .htaccess - 如何始终使用 .htaccess 重定向到 HTTPS?
- mysql - MySQL 如何从 Where 结果中选择 MIN