python-3.x - 如何删除除数据框中最后一行之外的特定单词之前的所有行以计算“增量”
问题描述
我有一个包含 [3599 行 x 5 列] 的数据框。我需要删除特定单词之前的所有行,除了数据框中的最后一行来计算“增量”。这是我拥有的数据框的示例:
df =
daytime ProductCase timestamp delta
0 2020-01-01 00:00:01 Product 2020-01-01 00:00:01 NaT
1 2020-01-01 00:00:02 Product 2020-01-01 00:00:02 0 days 00:00:01
2 2020-01-01 00:00:03 Product 2020-01-01 00:00:03 0 days 00:00:01
3 2020-01-01 00:00:04 NoProduct 2020-01-01 00:00:04 0 days 00:00:01
4 2020-01-01 00:00:05 NoProduct 2020-01-01 00:00:05 0 days 00:00:01
5 2020-01-01 00:00:06 NoProduct 2020-01-01 00:00:06 0 days 00:00:01
6 2020-01-01 00:00:07 NoProduct 2020-01-01 00:00:07 0 days 00:00:01
7 2020-01-01 00:00:08 Product 2020-01-01 00:00:08 0 days 00:00:01
8 2020-01-01 00:00:09 Product 2020-01-01 00:00:09 0 days 00:00:01
9 2020-01-01 00:00:10 NoProduct 2020-01-01 00:00:10 0 days 00:00:01
10 2020-01-01 00:00:11 NoProduct 2020-01-01 00:00:11 0 days 00:00:01
11 2020-01-01 00:00:12 NoProduct 2020-01-01 00:00:12 0 days 00:00:01
12 2020-01-01 00:00:13 NoProduct 2020-01-01 00:00:13 0 days 00:00:01
13 2020-01-01 00:00:14 NoProduct 2020-01-01 00:00:14 0 days 00:00:01
time
0 NaN
1 1.0
2 1.0
3 1.0
4 1.0
5 1.0
6 1.0
7 1.0
8 1.0
9 1.0
10 1.0
11 1.0
12 1.0
13 1.0
我想知道如何删除包含“NoProduct”的所有行,除了最后一行,计算增量以获得下面的数据框。
daytime ProductCase timestamp delta
0 2020-01-01 00:00:01 Product 2020-01-01 00:00:01 NaT
1 2020-01-01 00:00:02 Product 2020-01-01 00:00:02 0 days 00:00:01
2 2020-01-01 00:00:03 Product 2020-01-01 00:00:03 0 days 00:00:01
3 2020-01-01 00:00:07 NoProduct 2020-01-01 00:00:07 0 days 00:00:04
4 2020-01-01 00:00:08 Product 2020-01-01 00:00:08 0 days 00:00:01
5 2020-01-01 00:00:09 Product 2020-01-01 00:00:09 0 days 00:00:01
6 2020-01-01 00:00:14 NoProduct 2020-01-01 00:00:14 0 days 00:00:05
time
0 NaN
1 1.0
2 1.0
3 4.0
4 1.0
5 1.0
6 5.0
解决方案
要删除除最后一行后跟“Product”行之外的所有“NoProduct”行,请使用二进制掩码:
# Ensure dtypes are corrects
df['daytime'] = pd.to_datetime(df['daytime'])
df['timestamp'] = pd.to_datetime(df['timestamp'])
df['delta'] = pd.to_timedelta(df['delta'])
# Create the boolean mask
m1 = df['ProductCase'].eq('Product')
m2 = df['ProductCase'].eq('NoProduct') \
.eq(df['ProductCase'].shift(-1, fill_value='Product').eq('Product'))
mask = m1 | m2
# Do a copy of selected rows
out = df[mask].copy()
# Updates column values
out['delta'] = out['timestamp'].diff()
out['time'] = out['delta'].dt.total_seconds()
输出:
daytime ProductCase timestamp delta time
0 2020-01-01 00:00:01 Product 2020-01-01 00:00:01 NaT NaN
1 2020-01-01 00:00:02 Product 2020-01-01 00:00:02 0 days 00:00:01 1.0
2 2020-01-01 00:00:03 Product 2020-01-01 00:00:03 0 days 00:00:01 1.0
6 2020-01-01 00:00:07 NoProduct 2020-01-01 00:00:07 0 days 00:00:04 4.0
7 2020-01-01 00:00:08 Product 2020-01-01 00:00:08 0 days 00:00:01 1.0
8 2020-01-01 00:00:09 Product 2020-01-01 00:00:09 0 days 00:00:01 1.0
13 2020-01-01 00:00:14 NoProduct 2020-01-01 00:00:14 0 days 00:00:05 5.0
细节:
>>> pd.concat([df['ProductCase'], mask.rename('m1|m2'),
m1.rename('m1'), m2.rename('m2')], axis=1)
ProductCase m1|m2 m1 m2
0 Product True True False # m1, keep ('Product')
1 Product True True False # m1, keep ('Product')
2 Product True True True # m1, keep ('Product')
3 NoProduct False False False
4 NoProduct False False False
5 NoProduct False False False
6 NoProduct True False True # m2, keep (followed by 'Product')
7 Product True True False # m1, keep ('Product')
8 Product True True True # m1, keep ('Product')
9 NoProduct False False False
10 NoProduct False False False
11 NoProduct False False False
12 NoProduct False False False
13 NoProduct True False True # m2, Special case of 'shift'
对于特殊情况,fill_value='Product'
在移动时使用创建一个虚拟行 (idx 14) 以确保它始终被保留。
推荐阅读
- google-cloud-platform - 如何在 Google Cloud Platform 中获取当前项目的组织 ID?
- java - 有没有更好的方法可以在继承类中处理我的 Generic BaseClass?
- angular - 如何忽略打字稿或 idm 错误?
- python - 在一条线下找到一条线
- python - 由于比例因子和偏移,导入 python 时 NetCDF 数据的精度损失
- open-liberty - OpenLiberty 许可证生产功能
- java - 如何在开始,中间和结束时匹配特定字符的正则表达式
- node.js - 节点 js CORS 失败
- python - 如何使用fuzzywuzzy连接两个数据集
- node.js - Puppeteer 集群有时无法在 k8s 中启动