python - 如何在python中有效地映射时间序列之间的数据
问题描述
我正在尝试创建一个有效的函数来重新采样时间序列数据。
假设:两组时间序列数据具有相同的开始和结束时间。(我在一个单独的步骤中执行此操作。)
重采样功能(低效)
import numpy as np
def resample(desired_time_sequence, data_sequence):
downsampling_indices = np.linspace(0, len(data_sequence)-1, len(desired_time_sequence)).round().astype(int)
downsampled_array = [data_sequence[ind] for ind in downsampling_indices]
return downsampled_array
速度测试
import timeit
def test_speed(): resample([1,2,3], [.5,1,1.5,2,2.5,3,3.5,4,4.5,5,5.5,6])
print(timeit.timeit(test_speed, number=100000))
# 1.5003695999998854
有兴趣听到任何建议。
解决方案
更换
downsampled_array = [data_sequence[ind] for ind in downsampling_indices]
和
downsampled_array = data_sequence[downsampling_indices]
为我的测试数据提供了 7 倍的加速。
用于测量加速的代码:
import timeit
f1 = """
def resample(output_len, data_sequence):
downsampling_indices = np.linspace(0, len(data_sequence)-1, output_len).round().astype(int)
downsampled_array = [data_sequence[ind] for ind in downsampling_indices]
return downsampled_array
resample(output_len, data_sequence)
"""
f2 = """
def resample_fast(output_len, data_sequence):
downsampling_indices = np.linspace(0, len(data_sequence)-1, output_len).round().astype(int)
downsampled_array = data_sequence[downsampling_indices]
return downsampled_array
resample_fast(output_len, data_sequence)
"""
setup="""
import numpy as np
data_sequence = np.random.randn(10000)
output_len = 752
"""
print(timeit.timeit(f1, setup, number=1000))
print(timeit.timeit(f2, setup, number=1000))
# prints:
# 0.30194038699846715
# 0.041797632933594286