首页 > 解决方案 > df.isna().sum() 不适用于泰坦尼克数据集

问题描述

我在 kaggle 上尝试了泰坦尼克号模型。isna().sum() 输出错误信息很奇怪。

import os
import pandas as pd 
import numpy as np
import statsmodels.api as sm

from google.colab import auth
auth.authenticate_user()

import gspread
from oauth2client.client import GoogleCredentials

gc = gspread.authorize(GoogleCredentials.get_application_default())

worksheet = gc.open('titanic_train').sheet1

titanic = worksheet.get_all_records()
titanic = pd.DataFrame(titanic)
titanic
titanic.info()
titanic.isna().sum()

输出如下。

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          891 non-null    object 
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        891 non-null    object 
 11  Embarked     891 non-null    object 
dtypes: float64(1), int64(5), object(6)
memory usage: 83.7+ KB
PassengerId    0
Pclass         0
Name           0
Sex            0
Age            0
SibSp          0
Parch          0
Ticket         0
Fare           0
Cabin          0
Embarked       0
dtype: int64

它说 NaN 为 0 但在 Age, Embarked 中有几个 NaN。为什么它无法检测到 Nan?是因为Dtype吗??

标签: pythonpandasdataframe

解决方案


这是因为您的熊猫版本是 1.2.4。当我降级到 .24 或其他较低版本时,您将获得 nan 值


推荐阅读