首页 > 解决方案 > Dataframe drop between_time multiple rows by shifting timedelta

问题描述

I would like to drop multiple groups of rows by time criterion. Date criterion may be ignored. I have dataframe that contains 100 million rows, with around 0.001s sampling frequency - but it is variable for different columns. The goal is to drop multiple rows by criterion of "shifting". The leave duration might be 0.01 seconds and the drop duration might be 0.1 second, as shown in Figure:

The idea of dropping rows

I have many problems with Timestamp to Time conversions and with the defining the oneliner that will drop multiple groups of rows. I made tries with following code:

import pandas as pd
from datetime import timedelta#, timestamp
from datetime import datetime
import numpy as np

# leave_duration=0.01 seconds
# drop_duration=0.1 seconds

i = pd.date_range('2018-01-01 00:01:15.004', periods=1000, freq='2ms')
i=i.append(pd.date_range('2018-01-01 00:01:15.004', periods=1000, freq='3ms'))
i=i.append(pd.date_range('2018-01-01 00:01:15.004', periods=1000, freq='0.5ms'))
df = pd.DataFrame({'A': range(len(i))}, index=i)
df=df.sort_index()

minimum_time=df.index.min()
print("Minimum time:",minimum_time)
maximum_time=df.index.max()
print("Maximum time:",maximum_time)

# futuredate = minimum_time + timedelta(microseconds=100)

print("Dataframe before dropping:\n",df)
df.drop(df.between_time(*pd.to_datetime([minimum_time, maximum_time]).time).index, inplace=True)
print("Dataframe after dropping:\n",df)

# minimum_time=str(minimum_time).split()
# minimum_time=minimum_time[1]
# print(minimum_time)
# maximum_time=str(maximum_time).split()
# maximum_time=maximum_time[1]
# print(maximum_time)

How can I drop rows by time criterion, with shifting?

标签: pythonpandasdataframe

解决方案


Working for me:

df = df.loc[(df.index - df.index[0]) % pd.to_timedelta('110ms') > pd.to_timedelta('100ms')]

推荐阅读