首页 > 解决方案 > 插入数据以协调来自链接不同侧的读数的算法?

问题描述

我从单向网络链路的两侧(A 侧和 Z 侧)读取了传输速率读数,这些读数以时间戳和值的形式报告,并以 1 分钟的时间间隔聚合和拉取。在理想情况下,如果我们忽略传输延迟,链路两侧的读数应该相同(A 侧的输出速率 == Z 侧的输入速率),我想用它们来检测传输中是否存在数据丢失. 问题是 - 读数到达不同的时间点,因此来自 Z 侧的读数滞后 N 秒,这使得数据几乎毫无用处,即使链接上没有丢失我在不同时间点从 Z 侧获得读数,当速率为A面已经变了

最初的

是否有任何插值算法可以帮助及时协调这些信号?

我尝试创建两个数据帧的公共索引并使用线性插值将数据点添加到每个帧。它可以更好地显示图形之间的对齐情况,但是在快速增长/减速期间,同一时间点的数据点之间的距离很大,例如: 插值

字典形式的图表源数据:

df_a_side_out = {'output_bps': {Timestamp('2019-04-17 09:29:40-0700', tz='US/Pacific'): 35382522872.0, Timestamp('2019-04-17 09:30:41-0700', tz='US/Pacific'): 21079385419.6, Timestamp('2019-04-17 09:31:40-0700', tz='US/Pacific'): 31227610322.8, Timestamp('2019-04-17 09:32:40-0700', tz='US/Pacific'): 27822829221.333332, Timestamp('2019-04-17 09:33:40-0700', tz='US/Pacific'): 32904048834.8, Timestamp('2019-04-17 09:34:40-0700', tz='US/Pacific'): 25492801008.933334, Timestamp('2019-04-17 09:35:41-0700', tz='US/Pacific'): 35440406212.13333, Timestamp('2019-04-17 09:36:40-0700', tz='US/Pacific'): 25233478935.466667, Timestamp('2019-04-17 09:37:41-0700', tz='US/Pacific'): 40124788802.53333, Timestamp('2019-04-17 09:38:40-0700', tz='US/Pacific'): 22751043828.666668, Timestamp('2019-04-17 09:39:40-0700', tz='US/Pacific'): 34929660187.2, Timestamp('2019-04-17 09:40:41-0700', tz='US/Pacific'): 28188317863.733334, Timestamp('2019-04-17 09:41:41-0700', tz='US/Pacific'): 21337236735.866665, Timestamp('2019-04-17 09:42:40-0700', tz='US/Pacific'): 20949231319.333332, Timestamp('2019-04-17 09:43:41-0700', tz='US/Pacific'): 37289827508.933334, Timestamp('2019-04-17 09:44:40-0700', tz='US/Pacific'): 43531218338.53333, Timestamp('2019-04-17 09:45:41-0700', tz='US/Pacific'): 31844675965.333332, Timestamp('2019-04-17 09:46:40-0700', tz='US/Pacific'): 2393.3333333333335, Timestamp('2019-04-17 09:47:40-0700', tz='US/Pacific'): 6485669413.066667, Timestamp('2019-04-17 09:48:40-0700', tz='US/Pacific'): 27114641050.266666, Timestamp('2019-04-17 09:49:41-0700', tz='US/Pacific'): 30240896003.409836, Timestamp('2019-04-17 09:50:40-0700', tz='US/Pacific'): 47081233669.830505, Timestamp('2019-04-17 09:51:40-0700', tz='US/Pacific'): 45941505223.6, Timestamp('2019-04-17 09:52:40-0700', tz='US/Pacific'): 32794663316.133335, Timestamp('2019-04-17 09:53:41-0700', tz='US/Pacific'): 26202902204.666668, Timestamp('2019-04-17 09:54:40-0700', tz='US/Pacific'): 42744363073.46667, Timestamp('2019-04-17 09:55:40-0700', tz='US/Pacific'): 37591667043.6, Timestamp('2019-04-17 09:56:40-0700', tz='US/Pacific'): 11035404304.8, Timestamp('2019-04-17 09:57:40-0700', tz='US/Pacific'): 7707897097.466666, Timestamp('2019-04-17 09:58:40-0700', tz='US/Pacific'): 25327914733.066666, Timestamp('2019-04-17 09:59:40-0700', tz='US/Pacific'): 15763228742.8, Timestamp('2019-04-17 10:00:41-0700', tz='US/Pacific'): 30068024369.2, Timestamp('2019-04-17 10:01:40-0700', tz='US/Pacific'): 58940292672.26667, Timestamp('2019-04-17 10:02:41-0700', tz='US/Pacific'): 43484764068.26667, Timestamp('2019-04-17 10:03:41-0700', tz='US/Pacific'): 12948002074.266666, Timestamp('2019-04-17 10:04:41-0700', tz='US/Pacific'): 7776379160.655738, Timestamp('2019-04-17 10:05:40-0700', tz='US/Pacific'): 34174506576.81356, Timestamp('2019-04-17 10:06:40-0700', tz='US/Pacific'): 34642321006.933334, Timestamp('2019-04-17 10:07:40-0700', tz='US/Pacific'): 44025919118.13333, Timestamp('2019-04-17 10:08:41-0700', tz='US/Pacific'): 51441310396.8, Timestamp('2019-04-17 10:09:41-0700', tz='US/Pacific'): 49744733006.666664, Timestamp('2019-04-17 10:10:40-0700', tz='US/Pacific'): 39372041772.53333, Timestamp('2019-04-17 10:11:40-0700', tz='US/Pacific'): 37212362739.73333, Timestamp('2019-04-17 10:12:41-0700', tz='US/Pacific'): 29888187478.133335, Timestamp('2019-04-17 10:13:41-0700', tz='US/Pacific'): 23647225076.8, Timestamp('2019-04-17 10:14:41-0700', tz='US/Pacific'): 44232721589.333336, Timestamp('2019-04-17 10:15:40-0700', tz='US/Pacific'): 31619739302.8, Timestamp('2019-04-17 10:16:41-0700', tz='US/Pacific'): 34270903419.866665, Timestamp('2019-04-17 10:17:41-0700', tz='US/Pacific'): 37255143804.26667, Timestamp('2019-04-17 10:18:40-0700', tz='US/Pacific'): 29626685689.333332, Timestamp('2019-04-17 10:19:41-0700', tz='US/Pacific'): 37738576156.8, Timestamp('2019-04-17 10:20:41-0700', tz='US/Pacific'): 32520425703.733334, Timestamp('2019-04-17 10:21:40-0700', tz='US/Pacific'): 50682096771.066666, Timestamp('2019-04-17 10:22:40-0700', tz='US/Pacific'): 53442027636.0, Timestamp('2019-04-17 10:23:40-0700', tz='US/Pacific'): 48346635537.066666, Timestamp('2019-04-17 10:24:41-0700', tz='US/Pacific'): 28192208534.0, Timestamp('2019-04-17 10:25:41-0700', tz='US/Pacific'): 30508158848.533333, Timestamp('2019-04-17 10:26:40-0700', tz='US/Pacific'): 38669708961.73333, Timestamp('2019-04-17 10:27:41-0700', tz='US/Pacific'): 41905851091.333336, Timestamp('2019-04-17 10:28:40-0700', tz='US/Pacific'): 37885503188.4}}

df_z_side_in = {'input_bps': {Timestamp('2019-04-17 09:29:21-0700', tz='US/Pacific'): 32479665734.933334, Timestamp('2019-04-17 09:30:21-0700', tz='US/Pacific'): 28762393063.213116, Timestamp('2019-04-17 09:31:21-0700', tz='US/Pacific'): 24012409059.66102, Timestamp('2019-04-17 09:32:20-0700', tz='US/Pacific'): 30912397690.8, Timestamp('2019-04-17 09:33:21-0700', tz='US/Pacific'): 30150484213.508198, Timestamp('2019-04-17 09:34:21-0700', tz='US/Pacific'): 26572558234.666668, Timestamp('2019-04-17 09:35:20-0700', tz='US/Pacific'): 38830624164.47458, Timestamp('2019-04-17 09:36:20-0700', tz='US/Pacific'): 26512584207.866665, Timestamp('2019-04-17 09:37:20-0700', tz='US/Pacific'): 32343571104.133335, Timestamp('2019-04-17 09:38:21-0700', tz='US/Pacific'): 28372191073.704918, Timestamp('2019-04-17 09:39:20-0700', tz='US/Pacific'): 30009804008.677967, Timestamp('2019-04-17 09:40:20-0700', tz='US/Pacific'): 30764259885.2, Timestamp('2019-04-17 09:41:20-0700', tz='US/Pacific'): 27229582440.533333, Timestamp('2019-04-17 09:42:21-0700', tz='US/Pacific'): 12670550319.868853, Timestamp('2019-04-17 09:43:21-0700', tz='US/Pacific'): 38891533755.333336, Timestamp('2019-04-17 09:44:21-0700', tz='US/Pacific'): 46374133014.644066, Timestamp('2019-04-17 09:45:20-0700', tz='US/Pacific'): 40275148155.46667, Timestamp('2019-04-17 09:46:21-0700', tz='US/Pacific'): 2374.032786885246, Timestamp('2019-04-17 09:47:20-0700', tz='US/Pacific'): 3260927513.220339, Timestamp('2019-04-17 09:48:21-0700', tz='US/Pacific'): 19319788768.666668, Timestamp('2019-04-17 09:49:21-0700', tz='US/Pacific'): 29479921822.133335, Timestamp('2019-04-17 09:50:21-0700', tz='US/Pacific'): 42536464523.27869, Timestamp('2019-04-17 09:51:21-0700', tz='US/Pacific'): 48253007455.32204, Timestamp('2019-04-17 09:52:20-0700', tz='US/Pacific'): 28098055972.266666, Timestamp('2019-04-17 09:53:20-0700', tz='US/Pacific'): 34696013048.8, Timestamp('2019-04-17 09:54:21-0700', tz='US/Pacific'): 41089541187.540985, Timestamp('2019-04-17 09:55:20-0700', tz='US/Pacific'): 35818326833.355934, Timestamp('2019-04-17 09:56:21-0700', tz='US/Pacific'): 24461996828.0, Timestamp('2019-04-17 09:57:21-0700', tz='US/Pacific'): 2534090684.266667, Timestamp('2019-04-17 09:58:21-0700', tz='US/Pacific'): 22127687010.229507, Timestamp('2019-04-17 09:59:21-0700', tz='US/Pacific'): 23025967406.915253, Timestamp('2019-04-17 10:00:20-0700', tz='US/Pacific'): 10059074966.266666, Timestamp('2019-04-17 10:01:21-0700', tz='US/Pacific'): 67497142954.0, Timestamp('2019-04-17 10:02:21-0700', tz='US/Pacific'): 46389235268.0, Timestamp('2019-04-17 10:03:20-0700', tz='US/Pacific'): 21655645611.2, Timestamp('2019-04-17 10:04:21-0700', tz='US/Pacific'): 966253748.4, Timestamp('2019-04-17 10:05:20-0700', tz='US/Pacific'): 27733135839.866665, Timestamp('2019-04-17 10:06:21-0700', tz='US/Pacific'): 38420361510.55738, Timestamp('2019-04-17 10:07:20-0700', tz='US/Pacific'): 38791963200.27119, Timestamp('2019-04-17 10:08:21-0700', tz='US/Pacific'): 49337311755.333336, Timestamp('2019-04-17 10:09:21-0700', tz='US/Pacific'): 49036736751.2, Timestamp('2019-04-17 10:10:21-0700', tz='US/Pacific'): 40189220408.0, Timestamp('2019-04-17 10:11:20-0700', tz='US/Pacific'): 47269187739.333336, Timestamp('2019-04-17 10:12:21-0700', tz='US/Pacific'): 22747569814.666668, Timestamp('2019-04-17 10:13:20-0700', tz='US/Pacific'): 29592627519.066666, Timestamp('2019-04-17 10:14:21-0700', tz='US/Pacific'): 39522624640.78689, Timestamp('2019-04-17 10:15:20-0700', tz='US/Pacific'): 33426815865.627117, Timestamp('2019-04-17 10:16:20-0700', tz='US/Pacific'): 36818438483.86667, Timestamp('2019-04-17 10:17:21-0700', tz='US/Pacific'): 36014942532.327866, Timestamp('2019-04-17 10:18:21-0700', tz='US/Pacific'): 32190457857.333332, Timestamp('2019-04-17 10:19:20-0700', tz='US/Pacific'): 33696489212.067795, Timestamp('2019-04-17 10:20:20-0700', tz='US/Pacific'): 33386886955.333332, Timestamp('2019-04-17 10:21:20-0700', tz='US/Pacific'): 47954604950.13333, Timestamp('2019-04-17 10:22:21-0700', tz='US/Pacific'): 54281759713.57377, Timestamp('2019-04-17 10:23:20-0700', tz='US/Pacific'): 43724407654.37288, Timestamp('2019-04-17 10:24:20-0700', tz='US/Pacific'): 36995567964.666664, Timestamp('2019-04-17 10:25:21-0700', tz='US/Pacific'): 25491555548.590164, Timestamp('2019-04-17 10:26:21-0700', tz='US/Pacific'): 38326723270.26667, Timestamp('2019-04-17 10:27:20-0700', tz='US/Pacific'): 43034165564.61017, Timestamp('2019-04-17 10:28:20-0700', tz='US/Pacific'): 37405127893.6}}

标签: pythonpandasalgorithmresampling

解决方案


方法 1 对齐绘图

我们可以执行以下操作来使数据完全对齐,但我不确定这是什么类型的数据,以及以这种方式解决它是否真的有意义。但也许这会有所帮助。


  1. 首先,我们concat并排放置数据框。

  2. 然后我们fillna按相反的数据帧数据排列行。

df = pd.concat([df_a, df_b], axis=1)
df['output_bps'].fillna(df_d['input_bps'], inplace=True)
df['input_bps'].fillna(df_d['output_bps'], inplace=True)

然后我们再次绘制,我们看到它完全对齐。正如我们在图例中看到的,它实际上是两行

fig = plt.figure(figsize=(16,10))

plt.plot(df['output_bps'], label='Side A out')
plt.plot(df['input_bps'], label='Side Z in')
plt.legend(loc='upper left')
plt.show()

阴谋


方法2找到更准确的差异

所以好像我理解正确。由于传感器记录数据的时间戳不同,因此很难找到准确的差异(损失)。

我们可以处理我们的数据以使其更加准确。不只是插值。但是重新采样我们的数据以进行1 second索引,然后进行插值以获得更高的准确性。之后,我们在相同的时间戳上获取差异以找到差异。

这是我能得到的最接近的:

# reindex to make a new dataframe
df_z = pd.DataFrame(index=pd.date_range(start=df.index.min(), 
                                        end=df.index.max(), 
                                        freq='1S'), columns=df.columns)

# merge the values of original dataframe and remove columns we dont need
df_z = df_z.merge(df, 
                  left_index=True, 
                  right_index=True, 
                  how='left', 
                  suffixes=['_1', '']).filter(regex='(^[^0-9]+$)')

# fill NaN by linear interpolation
for col in df_z.columns:
    df_z[col] = df_z[col].interpolate(method='linear', limit_direction='both', )

# Calculate the loss on each second
df_z['loss'] = df_z['output_bps'] - df_z['input_bps']

现在我们可以再次绘制我们的数据,包括损失

fig = plt.figure(figsize=(16,10))

plt.plot(df_z['output_bps'], label='Side A out')
plt.plot(df_z['input_bps'], label='Side Z in')
plt.plot(df_z['loss'], label='Loss')
plt.legend(loc='upper left')

plt.show()

情节2


推荐阅读