python - 更改 python-pandas 中的索引后使用 df.loc 时缺少值
问题描述
将索引更改为“PassengerId”,然后尝试使用该df.loc
函数根据新索引检索信息,但结果包含缺失值
正在探索泰坦尼克号数据集。
- 附加了一个带有一些值的 new_row。
- 将索引更改为PassengerId。
- 尝试使用 df.loc 进行搜索。
- 得到值在现有行中消失的结果,但显示新附加行的值。
# Loading the dataset in to a Data Frame
dataset= pd.read_csv('Titanic_train.csv')
# Add a New Row at the bottom to the Dataset
new_row=pd.Series(data=['892','0','1','NA','NA','NA'], index=['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age'])
dataset=dataset.append(new_row, ignore_index=True)
# Setting PassengerId as Index
dataset= dataset.set_index(dataset['PassengerId'])
dataset.loc[['892','891','890']]
得到以下结果:
NaN for all the rows except for the new_row(892)
FutureWarning: Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative`
See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
PassengerId PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
892 892 0 1 NA NA NA NaN NaN NaN NaN NaN NaN
891 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
890 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
预期结果:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
PassengerId
890 890 1 1 Behr, Mr. Karl Howell male 26 0.0 0.0 111369 30.00 C148 C
891 891 0 3 Dooley, Mr. Patrick male 32 0.0 0.0 370376 7.75 NaN Q
892 892 0 1 NA NA NA NaN NaN NaN NaN NaN NaN
解决方案
部分答案:
运行测试...
import pandas as pd
import numpy as np
dataset= pd.DataFrame(columns=["PassengerId","Survived","Pclass","Name","Sex","Age","SibSp","Parch","Ticket","Fare","Cabin","Embarked"],data=[[891,1,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],[892,2,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan]])
print(dataset)
# Add rows
new_row=pd.Series(data=['890','0','1','NA','NA','NA'], index=['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age'])
dataset=dataset.append(new_row, ignore_index=True)
# Setting PassengerId as Index
dataset= dataset.set_index(dataset['PassengerId'])
dataset.loc[[892,891,890]]
print(dataset)
并产生以下结果:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare \
0 891 1 NaN NaN NaN NaN NaN NaN NaN NaN
1 892 2 NaN NaN NaN NaN NaN NaN NaN NaN
Cabin Embarked
0 NaN NaN
1 NaN NaN
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket \
PassengerId
891 891 1 NaN NaN NaN NaN NaN NaN NaN
892 892 2 NaN NaN NaN NaN NaN NaN NaN
890 890 0 1 NA NA NA NaN NaN NaN
Fare Cabin Embarked
PassengerId
891 NaN NaN NaN
892 NaN NaN NaN
890 NaN NaN NaN
似乎正是您正在寻找的
推荐阅读
- angular - 将加载变量状态从所有其他组件传递给导航栏组件
- reactjs - 如何在 React-Native 中共享生成的二维码?
- asp.net-mvc-4 - Umbraco V7xx 将 HTML5 视频插入网页
- java - Java Spring Redis:设置使用注释的时间
- .net-core - 将 .net 核心 dll 与 .net 框架应用程序一起使用?
- onclick - 点击条件
- angular - 在 routing.module 中定义不同的路由
- html - 为什么缩小 HTML 中的 textarea 标签不安全?
- java - 从具有属性的 jar 运行 java 应用程序
- http-headers - 防止对同一域的多个 OPTIONS 请求