首页 > 解决方案 > Pandas:为什么在 Titanic 数据库电子表格中缺少 Column 会将 dtype 作为对象?

问题描述

我从 kaggle 获得了巨大的数据,上传到谷歌电子表格并从 colab 读取。并发现 Age Dtype 由于缺少值(或其他原因)而得到对象。如何将 Age Dtype 更改为 float64?

from google.colab import auth
import pandas as pd
auth.authenticate_user()

import gspread
from oauth2client.client import GoogleCredentials

gc = gspread.authorize(GoogleCredentials.get_application_default())

worksheet = gc.open('titanic_train').sheet1

# get_all_values gives a list of rows.
datas = worksheet.get_all_records()
print(datas)

pd.DataFrame(datas).info()

我得到了下面的信息

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          891 non-null    object 
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        891 non-null    object 
 11  Embarked     891 non-null    object 
dtypes: float64(1), int64(5), object(6)
memory usage: 83.7+ KB

标签: pythonpandaskaggle

解决方案


您需要将该Age列转换为整数数据类型。这可以按如下方式完成:

df = pd.DataFrame(datas)

df['Age'] = pd.to_numeric(df['Age'])

推荐阅读