python - 计算 Pandas 中几列的非空值百分比
问题描述
对于如下数据框,如何计算Pandasnot-null values
中列的百分比?A, C, D
谢谢你。
id A B C D
0 1 1.0 one 4.0 NaN
1 2 NaN one 14.0 NaN
2 3 2.0 two 3.0 -12.0
3 4 55.0 three NaN 12.0
4 5 6.0 two 8.0 12.0
5 6 NaN two 7.0 -12.0
6 7 -17.0 one NaN NaN
7 8 NaN three 11.0 NaN
预期的结果是这样的:
id A B C D
0 not-nulls_pct 62.5% NaN 75.0% 50.0%
1 1 1 one 4 NaN
2 2 NaN one 14 NaN
3 3 2 two 3 -12
4 4 55 three NaN 12
5 5 6 two 8 12
6 6 NaN two 7 -12
7 7 -17 one NaN NaN
8 8 NaN three 11 NaN
解决方案
对于非 NaN 值的计数,请使用DataFrame.notna
with DataFrame.mean
。
然后有必要用100%
缺失值替换,一个可能的解决方案是Series.mask
- 默认情况下返回NaN
s,然后创建一行DataFrame
,Series.to_frame
并转置和前置concat
,最后设置id
自定义值的第一个值:
s = df.notna().mean()
df1 = s.mul(100).astype(str).add('%').mask(s == 1).to_frame().T
df = pd.concat([df1, df], ignore_index=True)
df.loc[0, 'id'] = 'not-nulls_pct'
print (df)
id A B C D
0 not-nulls_pct 62.5% NaN 75.0% 50.0%
1 1 1 one 4 NaN
2 2 NaN one 14 NaN
3 3 2 two 3 -12
4 4 55 three NaN 12
5 5 6 two 8 12
6 6 NaN two 7 -12
7 7 -17 one NaN NaN
8 8 NaN three 11 NaN
或使用setting with enlargement
with loc
, then 是 final 中第一行的必要排序索引DataFrame
:
s = df.notna().mean()
df.loc[-1] = np.where(s != 1, s.mul(100).astype(str).add('%'), np.nan)
df = df.sort_index().reset_index(drop=True)
df.loc[0, 'id'] = 'not-nulls_pct'
print (df)
id A B C D
0 not-nulls_pct 62.5% NaN 75.0% 50.0%
1 1 1 one 4 NaN
2 2 NaN one 14 NaN
3 3 2 two 3 -12
4 4 55 three NaN 12
5 5 6 two 8 12
6 6 NaN two 7 -12
7 7 -17 one NaN NaN
8 8 NaN three 11 NaN
推荐阅读
- mysql - @Query 不断从数据库中提取空值
- excel - 无法连接:“我们无法在本机查询之上折叠”错误
- android - 如何解决 exoplay2 延迟当前位置?
- javascript - 无法理解背景图像 url 在 Gatsby.js 中的工作方式
- tkinter - 滚动树视图时如何移动弹出窗口ttk.Combobox?
- regex - Notepad++ - 按第一个单词查找重复行
- node.js - 为什么“npm run build”会产生错误“react-scripts Permission denied”?
- java - @GetMapping 没有被导入
- migradoc - 如何从表格单元格中获取值
- nginx - Nginx 静态内容缓存 proxy_cache_bypass proxy_no_cache