首页 > 解决方案 > 删除收盘价

问题描述

我有带有浮点值(纬度/经度)的数据框(见图),我想删除精度为 0.02 的最接近的值。例如:

[0.03, 0.05, 0.04, 0.06] -> [0.04]

我怎样才能用熊猫方法做到这一点?

在此处输入图像描述

标签: pythonpandas

解决方案


尽管询问者不想提供更多细节,但这个问题很有趣。我认为给定的坐标点将被合并成组,其中纬度和经度值分别不超过精度的两倍,即在一个方形簇内,每个簇集中到一个点靠近中心。这个问题可以通过对点进行排序(例如借助scikit-learn OPTICS实现)、将它们分成满足聚类条件的组并应用近中心点的选择来解决。

import pandas as pd
df = pd.DataFrame({'lon': (20.489192, 20.47559,  20.481381, 20.4422,   20.474462),
                   'lat': (54.719898, 54.720311, 54.731917, 54.710419, 54.72706 )},
                  index=[3, 4, 20, 21, 24])

def group(x, minmax_group): # this function clusters points within +/- 0.02
    if not hasattr(x, "__len__"): x = (x, ) # if it has to work for the one-dimensional case
    # in two-dimensional case, x is coordinate pair (longitude, latitude)
    # minmax_group[0][min] is the minimum coordinate pair (lower left) of a cluster
    # minmax_group[0][max] is the maximum coordinate pair (upper right) of a cluster
    # minmax_group[1] is the sequential index of the cluster
    if minmax_group[0] is None: minmax_group[:] = {min:x, max:x}, 0 # first cluster
    # check if longitude or latitude outside of current cluster
    elif any(x[l] < minmax_group[0][max][l]-.04
                 or minmax_group[0][min][l]+.04 < x[l] for l in range(len(x))):
        minmax_group[0] = {min:x, max:x}
        minmax_group[1] += 1                # new cluster
    else:
        for m in minmax_group[0]:           # store current minimum/maximum coordinates
            minmax_group[0][m] = tuple(m(minmax_group[0][m][l], x[l]) for l in range(len(x)))
    return minmax_group[1]

from sklearn.cluster import OPTICS
opt = OPTICS().fit(df)  # order the points
# group the points; set index because only index is passed to groupby function
dt = df.reset_index().set_index(['lon', 'lat']).iloc[opt.ordering_].groupby(
    lambda x, minmax_group=[None]: group(x, minmax_group)).apply(
    # choose point at the center of the group; set index back to original
    lambda g: g.reset_index().iloc[[(len(g)-1)//2]]).set_index('index').rename_axis(None)
print(dt)

此示例的输出:

         lon        lat
4   20.47559  54.720311
21  20.44220  54.710419

推荐阅读