首页 > 解决方案 > 在间隔类型的列上过滤数据帧[float64]

问题描述

我有一个数据框,其中一列是类型interval[float64],我希望过滤该数据框以检索垃圾箱:

distance_bin  object:s in bin  percentage        Date
0   (-0.001, 1.0]          1092       46.00  2021-06-20
1      (1.0, 2.0]           533       22.45  2021-06-20
2      (2.0, 3.0]           257       10.83  2021-06-20
3      (3.0, 4.0]            46        1.94  2021-06-20
4      (4.0, 5.0]           144        6.07  2021-06-20
5      (5.0, 6.0]           117        4.93  2021-06-20
6      (6.0, 7.0]            58        2.44  2021-06-20
7      (7.0, 8.0]            46        1.94  2021-06-20
8      (8.0, 9.0]            48        2.02  2021-06-20
9     (9.0, 10.0]            16        0.67  2021-06-20
10   (10.0, 11.0]             6        0.25  2021-06-20
11   (11.0, 12.0]             8        0.34  2021-06-20
12   (12.0, 13.0]             1        0.04  2021-06-20
13   (15.0, 16.0]             1        0.04  2021-06-20
14   (21.0, 22.0]             1        0.04  2021-06-20
15  (-0.001, 1.0]          1080       45.55  2021-06-21
16     (1.0, 2.0]           546       23.03  2021-06-21
17     (2.0, 3.0]           245       10.33  2021-06-21
18     (3.0, 4.0]            54        2.28  2021-06-21
19     (4.0, 5.0]           150        6.33  2021-06-21
20     (5.0, 6.0]           104        4.39  2021-06-21
21     (6.0, 7.0]            44        1.86  2021-06-21
22     (7.0, 8.0]            51        2.15  2021-06-21
23     (8.0, 9.0]            38        1.60  2021-06-21
24    (9.0, 10.0]            36        1.52  2021-06-21
25   (10.0, 11.0]             7        0.30  2021-06-21
26   (11.0, 12.0]            15        0.63  2021-06-21
27   (12.0, 13.0]             1        0.04  2021-06-21
28  (-0.001, 1.0]          1094       46.24  2021-06-22
29     (1.0, 2.0]           517       21.85  2021-06-22
30     (2.0, 3.0]           289       12.21  2021-06-22
31     (3.0, 4.0]            42        1.78  2021-06-22
32     (4.0, 5.0]           139        5.87  2021-06-22
33     (5.0, 6.0]            98        4.14  2021-06-22
34     (6.0, 7.0]            43        1.82  2021-06-22
35     (7.0, 8.0]            47        1.99  2021-06-22
36     (8.0, 9.0]            46        1.94  2021-06-22
37    (9.0, 10.0]            30        1.27  2021-06-22
38   (10.0, 11.0]             6        0.25  2021-06-22
39   (11.0, 12.0]            15        0.63  2021-06-22

为了给您一种方法以正确的数据类型转换区间,我为您提供了以下代码:

def interval_type(s):
    table = str.maketrans({'[': '(', ']': ')'})
    left_closed = s.startswith('[')
    right_closed = s.endswith(']')

    left, right = ast.literal_eval(s.translate(table))

    t = 'neither'
    if left_closed and right_closed:
        t = 'both'
    elif left_closed:
        t = 'left'
    elif right_closed:
        t = 'right'

    return pd.Interval(left, right, closed=t)

df['distance_bin'] = df['distance_bin'].apply(interval_type)

现在,我认为过滤 df 以仅保持间隔(5.0, 6.0]就足以执行以下操作:

df_A = df_smh[df['distance_bin']=='(5.0, 6.0]']

但它返回一个空数据框。我尝试了许多其他的东西,比如:

df_A = df_smh[df.distance_bin.left>5 & df.distance_bin.right<=6]

但这会返回:

AttributeError: 'Series' object has no attribute 'left'

此处的任何指导将不胜感激。

标签: python-3.xpandas

解决方案


对我来说,通过以下方式进行比较Interval

df = pd.DataFrame({'distance_bin':[pd.Interval(-0.001, 1.0, closed='right'), 
                         pd.Interval(1.0, 2.0, closed='right'), 
                         pd.Interval(2.0, 3.0, closed='right'), 
                         pd.Interval(3.0, 4.0, closed='right'), 
                         pd.Interval(4.0, 5.0, closed='right')]})

df = df[df['distance_bin']==pd.Interval(1.0, 2.0, closed='right')]
print (df)
  distance_bin
1   (1.0, 2.0]

推荐阅读