首页 > 解决方案 > 为什么 groupby 方法给出 NA 值?

问题描述

Python 3.9.6,熊猫 1.2.4

我想找到每个日期医院收治的患者人数。我写了这段代码:

import pandas as pd

data_admission = data[['name_of_hospital', 'date_admission']].copy(deep=True)
data_admission = data_admission.dropna()
data_admission = data_admission.sort_values(by=['date_admission'])

data_admission['quantity_admission'] = 1
grouped = data_admission['quantity_admission'].groupby(data_admission['name_of_hospital'])
data_admission['quantity_admission'] = grouped.cumsum()

grouped = data_admission['quantity_admission'].groupby([data_admission['name_of_hospital'], data_admission['date_admission']])
result = grouped.max()

在结果变量的字段quantity_admission中,我得到了 NAN- 值,尽管没有一列有空值。为什么?

简单的例子(效果很好):

hosp1 = 'Name_1'
hosp2 = 'Name_2'

date1 = np.datetime64('2020-04-02', 'D')
date2 = np.datetime64('2020-04-01', 'D')
date3 = np.datetime64('2020-04-04', 'D')
date4 = np.datetime64('2020-04-03', 'D')

data_hosp = []
data_date = []
for date in [date2, date2, date3, date4]:
    data_hosp.append(hosp1)
    data_date.append(date)
    
    data_hosp.append(hosp2)
    if date==date2:
        data_date.append(date1)
    else:
        data_date.append(date3)
        
    
df = pd.DataFrame({'hospital':data_hosp, 'date':data_date})
df = df.sort_values(by=['date'])

df['count'] = 1
grouped = df['count'].groupby(df['hospital'])
df['count'] = grouped.cumsum()

grouped = df['count'].groupby([df['hospital'], df['date']])
df = grouped.max()

更新:我发现 NAN 值是在那些不在原始数据中的name_of_hospitaldate_admission列集中获得的。尚不清楚为什么 pandas 将这些不存在的组合与 NAN- 值相加。

标签: pythonpandas-groupby

解决方案


推荐阅读