首页 > 解决方案 > How to detect records that appear on a larger interval than the specified one?

问题描述

I have the following pandas DataFrame:

Col1 Date        Time
A    29/03/2019  00:00
A    29/03/2019  04:00
A    29/03/2019  08:00
A    29/03/2019  12:00
A    29/03/2019  16:00
A    29/03/2019  20:00
A    30/03/2019  00:00
A    30/03/2019  08:00
A    30/03/2019  12:00
A    30/03/2019  16:00
A    30/03/2019  20:00
A    31/03/2019  04:00

Basically, the data comes at the interval of 4 hours. I need to return rows that do not fit this criteria.

The expected output:

A    30/03/2019  08:00
A    31/03/2019  04:00

These are two rows that miss an intermediate record.

How can I do it?

标签: pythonpandas

解决方案


Try creating a boolean index to identify where the difference between rows is more than 4 hours.

Create a datetime column by adding Date and Time together, diff to get the difference between rows, then compare to a TimeDelta.

Modify the TimeDelta (pd.to_timedelta("4:00:00")) to change the desired interval.

import pandas as pd

df = pd.DataFrame({'Col1': {0: 'A', 1: 'A', 2: 'A', 3: 'A', 4: 'A', 5: 'A',
                            6: 'A', 7: 'A', 8: 'A', 9: 'A', 10: 'A', 11: 'A'},
                   'Date': {0: '29/03/2019', 1: '29/03/2019', 2: '29/03/2019',
                            3: '29/03/2019', 4: '29/03/2019', 5: '29/03/2019',
                            6: '30/03/2019', 7: '30/03/2019', 8: '30/03/2019',
                            9: '30/03/2019', 10: '30/03/2019',
                            11: '31/03/2019'},
                   'Time': {0: '00:00', 1: '04:00', 2: '08:00', 3: '12:00',
                            4: '16:00', 5: '20:00', 6: '00:00', 7: '08:00',
                            8: '12:00', 9: '16:00', 10: '20:00', 11: '04:00'}})

# Create Index
m = pd.to_datetime(
    df['Date'] + df['Time'], format='%d/%m/%Y%H:%M'
).diff() > pd.to_timedelta("4:00:00")

# Use Index to Filter DF
out = df[m]

# For Display
print(out)

out:

   Col1        Date   Time
7     A  30/03/2019  08:00
11    A  31/03/2019  04:00

推荐阅读