首页 > 解决方案 > 如何在pandas df中获取与其名称相关的列数

问题描述

我正在对数据进行预处理并管理缺失值。我想在列上设置阈值。对于单个列,如果值计数小于 50,则删除该列。

import numpy as np
import pandas as pd
from pandas import DataFrame

df = pd.read_csv('cbc_updated_1.csv')

然后我得到列数。

a = df.count(axis = 0)
print(a)

它给出了与其计数相关的列名。

IP ABN(RBC)RET Abn Scattergram       46
IP ABN(RBC)Reticulocytosis           23
IP ABN(PLT)Thrombocytosis            47
IP ABN(PLT)PLT Abn Scattergram        0
IP SUS(WBC)Blasts?                   57
IP SUS(WBC)Abn Lympho?               10
IP SUS(WBC)Left Shift?              190
IP SUS(WBC)Atypical Lympho?         126
IP SUS(RBC)RBC Agglutination?         0
IP SUS(RBC)Turbidity/HGB Interf?      9
IP SUS(RBC)Iron Deficiency?          27
IP SUS(RBC)HGB Defect?                3
IP SUS(RBC)Fragments?               168
IP SUS(PLT)PLT Clumps?               73
dtype: int64

接下来我想对上述数据运行循环以检查我的阈值条件...但我无法做到这一点..我尝试了以下代码..

for i in a:
    if i < 50:
        print(i)

结果我只得到了值,而不是列名。我需要两者。

46
23
47
0
10
0
9
27
3

我怎样才能产生这个?

标签: pythonpandasdataframecsv

解决方案


试试这个:

>>> a[a < 50]
IP ABN(RBC)RET Abn Scattergram      46
IP ABN(RBC)Reticulocytosis          23
IP ABN(PLT)Thrombocytosis           47
IP ABN(PLT)PLT Abn Scattergram       0
IP SUS(WBC)Abn Lympho?              10
IP SUS(RBC)RBC Agglutination?        0
IP SUS(RBC)Turbidity/HGB Interf?     9
IP SUS(RBC)Iron Deficiency?         27
IP SUS(RBC)HGB Defect?               3
dtype: int64
>>> 

如果你想要一个循环:

for x in a[a < 50].reset_index().to_numpy().tolist():
    print(*x)

IP ABN(RBC)RET Abn Scattergram 46
IP ABN(RBC)Reticulocytosis 23
IP ABN(PLT)Thrombocytosis 47
IP ABN(PLT)PLT Abn Scattergram 0
IP SUS(WBC)Abn Lympho? 10
IP SUS(RBC)RBC Agglutination? 0
IP SUS(RBC)Turbidity/HGB Interf? 9
IP SUS(RBC)Iron Deficiency? 27
IP SUS(RBC)HGB Defect? 3

推荐阅读