首页 > 解决方案 > Python:将标签分配给数组中的值

问题描述

我有一个代表一些时间序列数据的数组:

array([[[-0.59776013],
    [-0.59776013],
    [-0.59776013],
    [-0.31863936],
    [-0.31863936],
    [-0.31863936],
    [-0.31863936],
    [-0.31863936],
    [-0.31863936],
    [ 0.31863936],
    [ 0.31863936],
    [ 0.31863936],
    [-0.31863936],
    [-0.31863936],
    [-0.31863936],
    [-0.31863936],
    [-0.31863936],
    [-0.31863936],
    [ 0.59776013],
    [ 0.59776013],
    [ 0.59776013],
    [ 0.93458929],
    [ 0.93458929],
    [ 0.93458929],
    [-0.31863936],
    [-0.31863936],
    [-0.31863936],
    [-0.31863936],
    [-0.31863936],
    [-0.31863936],
    [-0.06270678],
    [-0.06270678],
    [-0.06270678],
    [-0.06270678],
    [-0.06270678],
    [-0.06270678],
    [-0.31863936],
    [-0.31863936],
    [-0.31863936],
    [-0.31863936],
    [-0.31863936],
    [-0.31863936],
    [ 0.75541503],
    [ 0.75541503],
    [ 0.75541503],
    [ 0.93458929],
    [ 0.93458929],
    [ 0.93458929],
    [-0.31863936],
    [-0.31863936],
    [-0.31863936],
    [ 0.75541503],
    [ 0.75541503],
    [ 0.75541503],
    [-0.31863936],
    [-0.31863936],
    [-0.31863936],
    [ 0.31863936],
    [ 0.31863936],
    [ 0.31863936],
    [ 0.        ],
    [ 0.        ],
    [ 0.        ],
    [ 0.        ],
    [ 0.        ],
    [ 0.        ],
    [ 0.        ],
    [ 0.        ]]])

该数组中的唯一值是:

np.unique(sax_dataset_inv)
array([-0.59776013, -0.31863936, -0.06270678,  0.        ,  0.31863936,
    0.59776013,  0.75541503,  0.93458929])

我的任务

将“F”分配给给定的数组值,“F”表示快速,“S”表示慢速,或“M”表示中等。

我的尝试

我可以完成 2 个作业,“F”或“S”:

sax_list = ['F' if element < 0 else 'S' for element in list(sax_dataset_inv.flatten())]

但是,我无法理解如何为 3 个不同的标签执行上述相同的表达式。

期望的输出

以 [-3-2-1,0,1,2,3,4,5,6] 数组为例

应将值 -3 到 -1 (含)指定为“F”。值 0 到 3(含)应指定为“M”。大于 3 的值应分配为“S”。

标签: pythonpandasnumpytime-series

解决方案


用于numpy.select矢量化解决方案:

new_arr = np.select([arr>3, (arr>=-3) & (arr<=-1), (arr>=0)&(arr<=3)],
                    ['S','F','M'], 
                    default='')
print (new_arr)

['F' 'F' 'F' 'M' 'M' 'M' 'M' 'S' 'S' 'S']

性能

arr = np.array([-3,-2,-1,0,1,2,3,4,5,6] * 1000)

my_list = [-3,-2,-1,0,1,2,3,4,5,6] * 1000

In [276]: %timeit my_list_mapping = ['F' if ((i >= -3) & (i <= -1)) else 'M' if ((i >= 0) & (i <= 3)) else 'S' for i in my_list]
1.14 ms ± 67.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [277]: %timeit np.select([arr>3, (arr>=-3) & (arr<=-1), (arr>=0)&(arr<=3)],['S','F','M'],  default='')
172 µs ± 7.35 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

推荐阅读