python-3.x - 重新采样后如何使用python在相同值之间添加相似值
问题描述
在这里,我有一个数据集,其中包含一个输入,包括日期和时间。这里时间不是固定时间。所以我所做的是将数据重新采样到 5 分钟。
然后我得到了包括 NaN 在内的空行。然后我尝试将 NaN 替换为相同的值。但在我的专栏中,我有不同的价值。
在我的 csv 文件数据中:
date time x
8/6/2018 6:15:00 1.1
8/6/2018 6:45:00 1.1
8/6/2018 7:45:00 1.2
8/6/2018 9:00:00 1.2
如您所见,我的数据时间不在某个固定时间。所以我做了什么,首先每 5 分钟重新采样一次我的数据。
这是我的代码:
def f (a):
b = a [['date','time','x']]
b.index = a['date']
c = b.resample('5T').apply(lambda x: x[0] if x.count() > 0 else None)
return c
data['day'] = data['date'].dt.date
data = data.groupby('day').apply(lambda x: f(x))
然后我得到了输出:
date time x
day date
2018-06-08 2018-06-08 06:15:00 2018-06-08 06:15:00 6:15:00 1.1
2018-06-08 06:20:00 NaT None nan
2018-06-08 06:25:00 NaT None nan
2018-06-08 06:30:00 NaT None nan
2018-06-08 06:35:00 NaT None nan
2018-06-08 06:40:00 NaT None nan
2018-06-08 06:45:00 2018-06-08 06:45:00 6:45:00 1.1
2018-06-08 06:50:00 NaT None nan
2018-06-08 06:55:00 NaT None nan
2018-06-08 07:00:00 NaT None nan
2018-06-08 07:05:00 NaT None nan
2018-06-08 07:10:00 NaT None nan
2018-06-08 07:15:00 NaT None nan
2018-06-08 07:20:00 NaT None nan
2018-06-08 07:25:00 NaT None nan
2018-06-08 07:30:00 NaT None nan
2018-06-08 07:35:00 NaT None nan
2018-06-08 07:40:00 NaT None nan
2018-06-08 07:45:00 2018-06-08 07:45:00 7:45:00 1.2
2018-06-08 07:50:00 NaT None nan
2018-06-08 07:55:00 NaT None nan
2018-06-08 08:00:00 NaT None nan
2018-06-08 08:05:00 NaT None nan
2018-06-08 08:10:00 NaT None nan
2018-06-08 08:15:00 NaT None nan
2018-06-08 08:20:00 NaT None nan
2018-06-08 08:25:00 NaT None nan
2018-06-08 08:30:00 NaT None nan
2018-06-08 08:35:00 NaT None nan
2018-06-08 08:40:00 NaT None nan
:
:
:
:
:
2018-06-08 09:00:00 2018-06-08 09:00:00 9:00:00 1.2
然后我尝试用那个 x 输入值替换 NaN。我试过这段代码:
data['x'] = data['x'].replace(np.nan, 1.1)
然后它充满了1.1。但根据我的 csv 这里我有介于两者之间的价值7.45 to 9:00:00 =1.2
所以我期望的输出是:
date time x expected x
day date
2018-06-08 2018-06-08 06:15:00 2018-06-08 06:15:00 6:15:00 1.1 1.1
2018-06-08 06:20:00 NaT None nan 1.1
2018-06-08 06:25:00 NaT None nan 1.1
2018-06-08 06:30:00 NaT None nan 1.1
2018-06-08 06:35:00 NaT None nan 1.1
2018-06-08 06:40:00 NaT None nan 1.1
2018-06-08 06:45:00 2018-06-08 06:45:00 6:45:00 1.1 1.1
2018-06-08 06:50:00 NaT None nan 1.1
2018-06-08 06:55:00 NaT None nan 1.1
2018-06-08 07:00:00 NaT None nan 1.1
2018-06-08 07:05:00 NaT None nan 1.1
2018-06-08 07:10:00 NaT None nan 1.1
2018-06-08 07:15:00 NaT None nan 1.1
2018-06-08 07:20:00 NaT None nan 1.1
2018-06-08 07:25:00 NaT None nan 1.1
2018-06-08 07:30:00 NaT None nan 1.1
2018-06-08 07:35:00 NaT None nan 1.1
2018-06-08 07:40:00 NaT None nan 1.1
2018-06-08 07:45:00 2018-06-08 07:45:00 7:45:00 1.2 1.2
2018-06-08 07:50:00 NaT None nan 1.2
2018-06-08 07:55:00 NaT None nan 1.2
2018-06-08 08:00:00 NaT None nan 1.2
2018-06-08 08:05:00 NaT None nan 1.2
2018-06-08 08:10:00 NaT None nan 1.2
2018-06-08 08:15:00 NaT None nan 1.2
2018-06-08 08:20:00 NaT None nan 1.2
2018-06-08 08:25:00 NaT None nan 1.2
2018-06-08 08:30:00 NaT None nan 1.2
2018-06-08 08:35:00 NaT None nan 1.2
2018-06-08 08:40:00 NaT None nan 1.2
: :
:
:
: :
: :
2018-06-08 09:00:00 2018-06-08 09:00:00 9:00:00 1.2 1.2
Run code snippetExpand snippet
如您所见,在我的预期输出中,在 1.2 两个值之间,我需要在 1.2 之间填充 1.2 值。
根据我的代码,它没有给我确切的输出。那么有人可以帮我解决这个问题吗?
这是我的 csv: 我的 csv
当我读取 csv 时,我的 x 值输出仅显示 1 个值。
代码:
data = pd.read_csv('data.csv')
输出:
date time x
0 8/6/2018 6:15:00 1
1 8/6/2018 6:45:00 1
2 8/6/2018 7:45:00 1
3 8/6/2018 9:00:00 1
4 8/6/2018 9:25:00 1
5 8/6/2018 9:30:00 1
6 8/6/2018 11:00:00 1
7 8/6/2018 11:30:00 1
解决方案
对我来说,前向填充缺失值工作得很好,你的函数也应该通过以下方式简化first
:
data['date'] = pd.to_datetime(data['date'] + ' ' + data['time'])
def f(a):
b = a [['date','time','x']]
b.index = a['date']
c = b.resample('5T').first()
return c
data['day'] = data['date'].dt.date
data = data.groupby('day').apply(lambda x: f(x))
data['x'] = data['x'].ffill()
print (data)
date time x
day date
2018-08-06 2018-08-06 06:15:00 2018-08-06 06:15:00 6:15:00 1.1
2018-08-06 06:20:00 NaT NaN 1.1
2018-08-06 06:25:00 NaT NaN 1.1
2018-08-06 06:30:00 NaT NaN 1.1
2018-08-06 06:35:00 NaT NaN 1.1
2018-08-06 06:40:00 NaT NaN 1.1
2018-08-06 06:45:00 2018-08-06 06:45:00 6:45:00 1.1
2018-08-06 06:50:00 NaT NaN 1.1
2018-08-06 06:55:00 NaT NaN 1.1
2018-08-06 07:00:00 NaT NaN 1.1
2018-08-06 07:05:00 NaT NaN 1.1
2018-08-06 07:10:00 NaT NaN 1.1
2018-08-06 07:15:00 NaT NaN 1.1
2018-08-06 07:20:00 NaT NaN 1.1
2018-08-06 07:25:00 NaT NaN 1.1
2018-08-06 07:30:00 NaT NaN 1.1
2018-08-06 07:35:00 NaT NaN 1.1
2018-08-06 07:40:00 NaT NaN 1.1
2018-08-06 07:45:00 2018-08-06 07:45:00 7:45:00 1.2
2018-08-06 07:50:00 NaT NaN 1.2
2018-08-06 07:55:00 NaT NaN 1.2
2018-08-06 08:00:00 NaT NaN 1.2
2018-08-06 08:05:00 NaT NaN 1.2
2018-08-06 08:10:00 NaT NaN 1.2
2018-08-06 08:15:00 NaT NaN 1.2
2018-08-06 08:20:00 NaT NaN 1.2
2018-08-06 08:25:00 NaT NaN 1.2
2018-08-06 08:30:00 NaT NaN 1.2
2018-08-06 08:35:00 NaT NaN 1.2
2018-08-06 08:40:00 NaT NaN 1.2
2018-08-06 08:45:00 NaT NaN 1.2
2018-08-06 08:50:00 NaT NaN 1.2
2018-08-06 08:55:00 NaT NaN 1.2
2018-08-06 09:00:00 2018-08-06 09:00:00 9:00:00 1.2
推荐阅读
- c# - 如何根据属性值 c#、asp.net 获取 xml 节点值?
- android - PhoneGap 应用程序允许导航的安全问题 - 空白屏幕
- bluetooth-lowenergy - BLE 状态代码“-402”是什么意思?
- javascript - 使用 jQuery 的导航隐藏菜单
- linkedin - 在“LinkedIn Marketing Developer Program”下访问人员搜索 API
- excel - 运行 Application.ExecuteExcel4Macro "Show.ToolBar(""Ribbon"",False)" 时隐藏 excel 徽标
- python - 等到用户单击 Matplotlib 图中的按钮以继续程序
- java - 如何使用 selenium 在表格内定位工具提示工具提示值
- javascript - 错误类型错误:无法读取未定义的属性“列表”
- mongodb - Mongodb分组并使用空数组推送