首页 > 解决方案 > 重新采样后如何使用python在相同值之间添加相似值

问题描述

在这里,我有一个数据集,其中包含一个输入,包括日期和时间。这里时间不是固定时间。所以我所做的是将数据重新采样到 5 分钟。

然后我得到了包括 NaN 在内的空行。然后我尝试将 NaN 替换为相同的值。但在我的专栏中,我有不同的价值。

在我的 csv 文件数据中:

date	time                   x
8/6/2018	6:15:00           1.1
8/6/2018	6:45:00           1.1
8/6/2018	7:45:00           1.2
8/6/2018	9:00:00           1.2
                             

如您所见,我的数据时间不在某个固定时间。所以我做了什么,首先每 5 分钟重新采样一次我的数据。

这是我的代码:

def f (a):
  b = a  [['date','time','x']]
  b.index = a['date']
  c = b.resample('5T').apply(lambda x: x[0] if x.count() > 0 else None)
return c

data['day'] = data['date'].dt.date
data = data.groupby('day').apply(lambda x: f(x))

然后我得到了输出:

                                              date      time      x
day        date                                                         
2018-06-08 2018-06-08 06:15:00  2018-06-08 06:15:00    6:15:00     1.1
           2018-06-08 06:20:00                 NaT      None      nan   
           2018-06-08 06:25:00                 NaT      None      nan   
           2018-06-08 06:30:00                 NaT      None      nan   
           2018-06-08 06:35:00                 NaT      None      nan   
           2018-06-08 06:40:00                 NaT      None      nan   
           2018-06-08 06:45:00 2018-06-08 06:45:00    6:45:00     1.1
           2018-06-08 06:50:00                 NaT      None      nan   
           2018-06-08 06:55:00                 NaT      None      nan   
           2018-06-08 07:00:00                 NaT      None      nan   
           2018-06-08 07:05:00                 NaT      None      nan   
           2018-06-08 07:10:00                 NaT      None      nan   
           2018-06-08 07:15:00                 NaT      None      nan   
           2018-06-08 07:20:00                 NaT      None      nan   
           2018-06-08 07:25:00                 NaT      None      nan   
           2018-06-08 07:30:00                 NaT      None      nan   
           2018-06-08 07:35:00                 NaT      None      nan   
           2018-06-08 07:40:00                 NaT      None      nan   
           2018-06-08 07:45:00 2018-06-08 07:45:00   7:45:00      1.2               
           2018-06-08 07:50:00                 NaT      None      nan   
           2018-06-08 07:55:00                 NaT      None      nan   
           2018-06-08 08:00:00                 NaT      None      nan   
           2018-06-08 08:05:00                 NaT      None      nan   
           2018-06-08 08:10:00                 NaT      None      nan   
           2018-06-08 08:15:00                 NaT      None      nan   
           2018-06-08 08:20:00                 NaT      None      nan   
           2018-06-08 08:25:00                 NaT      None      nan   
           2018-06-08 08:30:00                 NaT      None      nan   
           2018-06-08 08:35:00                 NaT      None      nan   
           2018-06-08 08:40:00                 NaT      None      nan
                                      :
                                      :
                                      :
                                      :
                                      :
          2018-06-08 09:00:00  2018-06-08 09:00:00    9:00:00      1.2   

然后我尝试用那个 x 输入值替换 NaN。我试过这段代码:

data['x'] = data['x'].replace(np.nan, 1.1)

然后它充满了1.1。但根据我的 csv 这里我有介于两者之间的价值7.45 to 9:00:00 =1.2

所以我期望的输出是:

                                               date      time     x    expected x
day        date                                                         
2018-06-08 2018-06-08 06:15:00  2018-06-08 06:15:00    6:15:00    1.1      1.1
           2018-06-08 06:20:00                 NaT      None      nan      1.1 
           2018-06-08 06:25:00                 NaT      None      nan      1.1
           2018-06-08 06:30:00                 NaT      None      nan      1.1
           2018-06-08 06:35:00                 NaT      None      nan      1.1
           2018-06-08 06:40:00                 NaT      None      nan      1.1
           2018-06-08 06:45:00 2018-06-08 06:45:00    6:45:00     1.1      1.1                 
           2018-06-08 06:50:00                 NaT      None      nan      1.1
           2018-06-08 06:55:00                 NaT      None      nan      1.1
           2018-06-08 07:00:00                 NaT      None      nan      1.1
           2018-06-08 07:05:00                 NaT      None      nan      1.1
           2018-06-08 07:10:00                 NaT      None      nan      1.1
           2018-06-08 07:15:00                 NaT      None      nan      1.1
           2018-06-08 07:20:00                 NaT      None      nan      1.1
           2018-06-08 07:25:00                 NaT      None      nan      1.1
           2018-06-08 07:30:00                 NaT      None      nan      1.1 
           2018-06-08 07:35:00                 NaT      None      nan      1.1
           2018-06-08 07:40:00                 NaT      None      nan      1.1
           2018-06-08 07:45:00 2018-06-08 07:45:00   7:45:00      1.2      1.2            
           2018-06-08 07:50:00                 NaT      None      nan      1.2
           2018-06-08 07:55:00                 NaT      None      nan      1.2 
           2018-06-08 08:00:00                 NaT      None      nan      1.2
           2018-06-08 08:05:00                 NaT      None      nan      1.2
           2018-06-08 08:10:00                 NaT      None      nan      1.2
           2018-06-08 08:15:00                 NaT      None      nan      1.2
           2018-06-08 08:20:00                 NaT      None      nan      1.2
           2018-06-08 08:25:00                 NaT      None      nan      1.2
           2018-06-08 08:30:00                 NaT      None      nan      1.2
           2018-06-08 08:35:00                 NaT      None      nan      1.2
           2018-06-08 08:40:00                 NaT      None      nan      1.2
                                      :                                     :
                                      :
                                      :
                                      :                                     :
                                      :                                     :
          2018-06-08 09:00:00  2018-06-08 09:00:00    9:00:00      1.2     1.2
Run code snippetExpand snippet

如您所见,在我的预期输出中,在 1.2 两个值之间,我需要在 1.2 之间填充 1.2 值。

根据我的代码,它没有给我确切的输出。那么有人可以帮我解决这个问题吗?

这是我的 csv: 我的 csv

当我读取 csv 时,我的 x 值输出仅显示 1 个值。

代码:

data = pd.read_csv('data.csv')

输出:

      date      time            x
0     8/6/2018   6:15:00        1      
1     8/6/2018   6:45:00        1    
2     8/6/2018   7:45:00        1    
3     8/6/2018   9:00:00        1      
4     8/6/2018   9:25:00        1     
5     8/6/2018   9:30:00        1     
6     8/6/2018  11:00:00        1     
7     8/6/2018  11:30:00        1    

标签: python-3.xpandastime

解决方案


对我来说,前向填充缺失值工作得很好,你的函数也应该通过以下方式简化first

data['date'] = pd.to_datetime(data['date'] + ' ' + data['time'])

def f(a):
  b = a  [['date','time','x']]
  b.index = a['date']
  c = b.resample('5T').first()
  return c

data['day'] = data['date'].dt.date
data = data.groupby('day').apply(lambda x: f(x))
data['x'] = data['x'].ffill() 

print (data)
                                              date     time    x
day        date                                                 
2018-08-06 2018-08-06 06:15:00 2018-08-06 06:15:00  6:15:00  1.1
           2018-08-06 06:20:00                 NaT      NaN  1.1
           2018-08-06 06:25:00                 NaT      NaN  1.1
           2018-08-06 06:30:00                 NaT      NaN  1.1
           2018-08-06 06:35:00                 NaT      NaN  1.1
           2018-08-06 06:40:00                 NaT      NaN  1.1
           2018-08-06 06:45:00 2018-08-06 06:45:00  6:45:00  1.1
           2018-08-06 06:50:00                 NaT      NaN  1.1
           2018-08-06 06:55:00                 NaT      NaN  1.1
           2018-08-06 07:00:00                 NaT      NaN  1.1
           2018-08-06 07:05:00                 NaT      NaN  1.1
           2018-08-06 07:10:00                 NaT      NaN  1.1
           2018-08-06 07:15:00                 NaT      NaN  1.1
           2018-08-06 07:20:00                 NaT      NaN  1.1
           2018-08-06 07:25:00                 NaT      NaN  1.1
           2018-08-06 07:30:00                 NaT      NaN  1.1
           2018-08-06 07:35:00                 NaT      NaN  1.1
           2018-08-06 07:40:00                 NaT      NaN  1.1
           2018-08-06 07:45:00 2018-08-06 07:45:00  7:45:00  1.2
           2018-08-06 07:50:00                 NaT      NaN  1.2
           2018-08-06 07:55:00                 NaT      NaN  1.2
           2018-08-06 08:00:00                 NaT      NaN  1.2
           2018-08-06 08:05:00                 NaT      NaN  1.2
           2018-08-06 08:10:00                 NaT      NaN  1.2
           2018-08-06 08:15:00                 NaT      NaN  1.2
           2018-08-06 08:20:00                 NaT      NaN  1.2
           2018-08-06 08:25:00                 NaT      NaN  1.2
           2018-08-06 08:30:00                 NaT      NaN  1.2
           2018-08-06 08:35:00                 NaT      NaN  1.2
           2018-08-06 08:40:00                 NaT      NaN  1.2
           2018-08-06 08:45:00                 NaT      NaN  1.2
           2018-08-06 08:50:00                 NaT      NaN  1.2
           2018-08-06 08:55:00                 NaT      NaN  1.2
           2018-08-06 09:00:00 2018-08-06 09:00:00  9:00:00  1.2

推荐阅读