python-3.x - 当我尝试处理熊猫中的缺失值时,某些方法不起作用
问题描述
我正在尝试处理数据集中的一些缺失值。这是我用来学习的教程的链接。下面是我用来读取数据的代码。
import pandas as pd
import numpy as np
questions = pd.read_csv("./archive/questions.csv")
print(questions.head())
这就是我的数据的样子
这些是我用来处理缺失值的方法。他们都没有工作。
questions.replace(to_replace = np.nan, value = -99)
questions = questions.fillna(method ='pad')
questions.interpolate(method ='linear', limit_direction = 'forward')
然后我尝试删除缺少值的行。他们都没有工作。他们都返回空数据框。
questions.dropna()
questions.dropna(how = "all")
questions.dropna(axis = 1)
我做错了什么?
编辑:
值来自questions.head()
[[1 '2008-07-31T21:26:37Z' nan '2011-03-28T00:53:47Z' 1 nan 0.0]
[4 '2008-07-31T21:42:52Z' nan nan 458 8.0 13.0]
[6 '2008-07-31T22:08:08Z' nan nan 207 9.0 5.0]
[8 '2008-07-31T23:33:19Z' '2013-06-03T04:00:25Z' '2015-02-11T08:26:40Z'
42 nan 8.0]
[9 '2008-07-31T23:40:59Z' nan nan 1410 1.0 58.0]]
字典形式的值questions.head()
。
{'Id': {0: 1, 1: 4, 2: 6, 3: 8, 4: 9}, 'CreationDate': {0: '2008-07-31T21:26:37Z', 1: '2008-07-31T21:42:52Z', 2: '2008-07-31T22:08:08Z', 3: '2008-07-31T23:33:19Z', 4: '2008-07-31T23:40:59Z'}, 'ClosedDate': {0: nan, 1: nan, 2: nan, 3: '2013-06-03T04:00:25Z', 4: nan}, 'DeletionDate': {0: '2011-03-28T00:53:47Z', 1: nan, 2: nan, 3: '2015-02-11T08:26:40Z', 4: nan}, 'Score': {0: 1, 1: 458, 2: 207, 3: 42, 4: 1410}, 'OwnerUserId': {0: nan, 1: 8.0, 2: 9.0, 3: nan, 4: 1.0}, 'AnswerCount': {0: 0.0, 1: 13.0, 2: 5.0, 3: 8.0, 4: 58.0}}
有关数据集的信息
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17203824 entries, 0 to 17203823
Data columns (total 7 columns):
# Column Dtype
--- ------ -----
0 Id int64
1 CreationDate object
2 ClosedDate object
3 DeletionDate object
4 Score int64
5 OwnerUserId float64
6 AnswerCount float64
dtypes: float64(2), int64(2), object(3)
memory usage: 918.8+ MB
解决方案
您可以尝试axis
明确指定并查看它是否有效吗?另一个 fillna() 应该在没有轴的情况下仍然可以工作,但是对于 pad 你需要它,所以它知道如何填充缺失的值。
>>> questions.fillna(method='pad', axis=1)
Id CreationDate ClosedDate DeletionDate Score OwnerUserId AnswerCount
0 1 2008-07-31T21:26:37Z 2008-07-31T21:26:37Z 2011-03-28T00:53:47Z 1 1 0
1 4 2008-07-31T21:42:52Z 2008-07-31T21:42:52Z 2008-07-31T21:42:52Z 458 8 13
2 6 2008-07-31T22:08:08Z 2008-07-31T22:08:08Z 2008-07-31T22:08:08Z 207 9 5
3 8 2008-07-31T23:33:19Z 2013-06-03T04:00:25Z 2015-02-11T08:26:40Z 42 42 8
4 9 2008-07-31T23:40:59Z 2008-07-31T23:40:59Z 2008-07-31T23:40:59Z 1410 1 58
刚刚fillna()
应用于整个 DataFrame 按预期工作。
>>> questions.fillna('-')
Id CreationDate ClosedDate DeletionDate Score OwnerUserId AnswerCount
0 1 2008-07-31T21:26:37Z - 2011-03-28T00:53:47Z 1 - 0.0
1 4 2008-07-31T21:42:52Z - - 458 8 13.0
2 6 2008-07-31T22:08:08Z - - 207 9 5.0
3 8 2008-07-31T23:33:19Z 2013-06-03T04:00:25Z 2015-02-11T08:26:40Z 42 - 8.0
4 9 2008-07-31T23:40:59Z - - 1410 1 58.0
推荐阅读
- javascript - 使用 onclick 事件侦听器从另一个 html 文件调用 js 函数
- google-apps-script - 在 Google App Script 中更改 Widget KeyValue Switch SetValue 的值
- oracle11g - oracle查询,union语句比单个语句慢
- c++ - 如何将多个元素附加到协议缓冲区中的重复字段?
- ethereum - 如何只允许某些用户执行智能合约方法?
- reactjs - 类型错误:data.forEach 不是函数。错误就像我不知道为什么
- php - Easyadmin 3 字段禁用/只读
- dialogflow-es - Google 助理帐户关联 - 使用后端进行身份验证
- count - 如何计算列中特定模式的出现次数 - SAS?
- linux - 表达式“< /dev/null some_command”有什么作用?