python - 熊猫:回顾过去
问题描述
我正在查看车辆的速度,我拥有的唯一数据是速度稳定、减速或停止(参见下面的 df)。还有一个(加速),但在当前的df中找不到这个。
如您所见,有 2 个“减速”时期。我只对从停止前的最后一个“减速”时期开始的数据感兴趣。
如何过滤数据以删除我不感兴趣的前 x 行?由于速度值总是不同的,我不能简单地过滤值。
希望你能帮忙!
import pandas as pd
data = {
"Date and Time": ["2020-06-07 00:00", "2020-06-07 00:01", "2020-06-07 00:02", "2020-06-07 00:03", "2020-06-07 00:04", "2020-06-07 00:05", "2020-06-07 00:06", "2020-06-07 00:07", "2020-06-07 00:08", "2020-06-07 00:09", "2020-06-07 00:10", "2020-06-07 00:11", "2020-06-07 00:12", "2020-06-07 00:13", "2020-06-07 00:14", "2020-06-07 00:15", "2020-06-07 00:16", "2020-06-07 00:17", "2020-06-07 00:18", "2020-06-07 00:19", "2020-06-07 00:20"],
"Values": ["Stable","Stable","Stable","Stable", "Slowing down","Slowing down","Slowing down","Stable", "Stable", "Stable", "Slowing down","Slowing down","Slowing down","Slowing down","Slowing down","Slowing down","Slowing down","Slowing down", "Stopped", "Stopped", "Stopped"]
}
df = pd.DataFrame(data)
df.head()
解决方案
.cumsum()
您可以使用then filter by .loc
with Values
equals获得减速周期的序列,Slowing down
新创建的序列是最大值:
df['SlowDownSeq'] = df['Values'].ne(df['Values'].shift()).cumsum()
df_selected = df.loc[df['SlowDownSeq'] == df.loc[df['Values'] == 'Slowing down', 'SlowDownSeq'].max()].drop('SlowDownSeq', axis=1)
结果:
print(df_selected)
Date and Time Values
10 2020-06-07 00:10:00 Slowing down
11 2020-06-07 00:11:00 Slowing down
12 2020-06-07 00:12:00 Slowing down
13 2020-06-07 00:13:00 Slowing down
14 2020-06-07 00:14:00 Slowing down
15 2020-06-07 00:15:00 Slowing down
16 2020-06-07 00:16:00 Slowing down
17 2020-06-07 00:17:00 Slowing down