首页 > 解决方案 > 使用阈值对点是否在指定区域内进行分类 - python

问题描述

我有一个包含 xy 点的 df。如果这些点仅位于这些帧的多边形内,我想删除它们。这表现area如下。这些点会从这个区域来来去去,所以我只想在它们最终放置在那里时删除。否则将它们保存在 df 中。

中心困境是我不想在这里通过严格的规则。因为这些点是流动的,所以我希望结合灵活性。例如,某些点可能会暂时通过该区域,不应删除。而其他点位于该区域内的时间足够长,应该将其移除。

显而易见的方法是在这里通过一些阈值方法。使用df1下面,A位于区域内3帧,而B位于区域内7帧。如果我通过 > 5 帧的阈值,B则应删除该区域内的帧,而A不应受到影响。

问题是,它必须是连续的帧。点会来来去去,所以我只想在连续 5 帧后删除。

import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
import random

df = pd.DataFrame({
    'X' : [-5,10,-5,-5,-5,-5,-5,-5,-5,30,20,10,0,-5,-5,-5,-5,-5,-5,-5,5],  
    'Y' : [50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50],                  
    'Label' : ['A','A','A','A','A','A','A','A','A','A','B','B','B','B','B','B','B','B','B','B','B'], 
    'Time' : [501,502,503,504,505,506,507,508,509,510,501,502,503,504,505,506,507,508,509,510,511],                         
    })

# designated area
x = ([1.5,-0.5,-1.25,-0.5,1.5,-11,-11,1.5]) 
y = ([75,62.5,50,37.5,25,25,75,75])

area = mpltPath.Path([[x, y] for x, y in zip(x, y)])
df1['is_inside'] = area.contains_points(df1[['X','Y']])

出去:

     X   Y Label  Time  is_inside
0   20  50     A   501       True # inside but only 1 frame. Keep
1   10  50     A   502      False # keep
2    0  50     A   503       True # inside total 7 frames (remove)
3   -5  50     A   504       True # inside total 7 frames (remove)
4   -5  50     A   505       True # inside total 7 frames (remove)
5   -5  50     A   506       True # inside total 7 frames (remove)
6    0  50     A   507       True # inside total 7 frames (remove)
7   10  50     A   508       True # inside total 7 frames (remove)
8   20  50     A   509       True # inside total 7 frames (remove)
9   30  50     A   510      False # keep
10  20  50     B   501      False # keep
11  10  50     B   502      False # keep
12   0  50     B   503      False # keep
13  -5  50     B   504       True # inside total 7 frames (remove)
14  -5  50     B   505       True # inside total 7 frames (remove)
15  -5  50     B   506       True # inside total 7 frames (remove)
16  -5  50     B   507       True # inside total 7 frames (remove)
17  -5  50     B   508       True # inside total 7 frames (remove)
18  -5  50     B   509       True # inside total 7 frames (remove)
19  -5  50     B   510       True # inside total 7 frames (remove)
20   5  50     B   511      False # keep

预期输出:

     X   Y Label  Time 
0   -5  50     A   501     
1   10  50     A   502         
9   30  50     A   510     
10  20  50     B   501     
11  10  50     B   502     
12   0  50     B   503     
20   5  50     B   511     

标签: pythonpandasclassification

解决方案


我首先复制您的数据:

import pandas as pd 
import matplotlib as mpl

x = [1.5, -0.5, -1.25, -0.5, 1.5, -11, -11, 1.5]
y = [75, 62.5, 50, 37.5, 25, 25, 75, 75]
vertices = list(zip(x, y))
polygon = mpl.path.Path(vertices, closed=True)

df = pd.DataFrame({
    'X' : [-5, 10, -5, -5, -5, -5, -5, -5, -5, 30, 
           20, 10, 0, -5, -5, -5, -5, -5, -5, -5, 5],  
    'Y' : [50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 
           50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50],                  
    'Label' : list('A'*10 + 'B'*11), 
    'Time' : 2*list(range(501, 511)) + [511]
    })
df = df.sort_values(['Label', 'Time'])
df['is_inside'] = polygon.contains_points(df[['X','Y']])

这是原始 DataFrame 的外观:

In [91]: df
Out[91]: 
     X   Y Label  Time  is_inside
0   -5  50     A   501       True
1   10  50     A   502      False
2   -5  50     A   503       True
3   -5  50     A   504       True
4   -5  50     A   505       True
5   -5  50     A   506       True
6   -5  50     A   507       True
7   -5  50     A   508       True
8   -5  50     A   509       True
9   30  50     A   510      False
10  20  50     B   501      False
11  10  50     B   502      False
12   0  50     B   503      False
13  -5  50     B   504       True
14  -5  50     B   505       True
15  -5  50     B   506       True
16  -5  50     B   507       True
17  -5  50     B   508       True
18  -5  50     B   509       True
19  -5  50     B   510       True
20   5  50     B   511      False

您可以使用itertools.groupby删除不需要的点:

import numpy as np
from itertools import groupby

threshold = 5

indexer = []

for label in np.unique(df['Label']):
    for key, group in groupby(df.loc[df['Label'] == label]['is_inside']):
        runlength = len(list(group))
        remove = key and (runlength > threshold)
        indexer.extend([remove]*runlength)

df.drop(df[indexer].index, inplace=True)

输出:

In [92]: df
Out[92]: 
     X   Y Label  Time  is_inside
0   -5  50     A   501       True
1   10  50     A   502      False
9   30  50     A   510      False
10  20  50     B   501      False
11  10  50     B   502      False
12   0  50     B   503      False
20   5  50     B   511      False

推荐阅读