python - 循环遍历Python中多列中的不同值
问题描述
我想在df
下面循环查找活动的持续时间,其中:
- 对于相同的
X
,有两个Y
值。 - 如果
a
轮到任何一个唯一值(例如,for 、变为 for 或变为 for ),1
则标志着活动的开始;如果轮到的 两个唯一值,则标志着活动的结束(例如, for 、变为for和变为for )。Y
X ==18
a
1
Y==13
a
1
Y==14
a
0
Y
X ==18
a
0
Y==13
a
0
Y==14
Timestamp X Y a b Type
0 2000-10-26 10:08:27.060 18 14 0.0 24.5 medium
1 2000-10-26 10:39:24.310 18 13 1.0 24.0 low # Start
2 2000-10-26 11:50:48.190 18 14 1.0 23.5 medium
3 2000-10-26 17:18:07.610 18 14 1.0 23.5 medium
4 2000-10-26 17:18:09.610 18 14 0.0 23.5 medium
5 2000-10-26 17:29:10.610 18 14 0.0 26.5 medium
6 2000-10-26 17:29:10.770 18 14 1.0 26.5 medium
7 2000-10-26 17:29:12.610 18 14 1.0 53.5 medium
8 2000-10-26 17:29:14.610 18 14 1.0 62.0 medium
9 2000-10-26 17:29:14.770 18 13 1.0 24.0 low
10 2000-10-26 17:29:16.610 18 14 1.0 64.5 medium
11 2000-10-26 17:29:18.770 18 14 0.0 64.5 medium
12 2000-10-26 17:29:18.770 18 13 0.0 24.0 low # End
13 2000-10-26 17:29:28.770 18 14 0.0 63.5 medium
14 2000-10-26 17:29:34.770 19 16 0.0 62.0 medium
15 2000-10-26 17:29:40.770 19 16 1.0 61.0 medium # Start
16 2000-10-26 17:29:46.770 19 16 1.0 60.0 medium
17 2000-10-26 17:32:01.180 19 17 1.0 25.0 low
18 2000-10-26 17:32:01.180 19 16 0.0 51.5 medium
19 2000-10-26 17:32:35.180 19 17 0.0 50.0 medium # End
可重现的例子:
from datetime import *
from pandas import *
df= pd.DataFrame({'Timestamp': {0: Timestamp('2000-10-26 10:08:27.060000'),
1: Timestamp('2000-10-26 10:39:24.310000'),
2: Timestamp('2000-10-26 11:50:48.190000'),
3: Timestamp('2000-10-26 17:18:07.610000'),
4: Timestamp('2000-10-26 17:18:09.610000'),
5: Timestamp('2000-10-26 17:29:10.610000'),
6: Timestamp('2000-10-26 17:29:10.770000'),
7: Timestamp('2000-10-26 17:29:12.610000'),
8: Timestamp('2000-10-26 17:29:14.610000'),
9: Timestamp('2000-10-26 17:29:14.770000'),
10: Timestamp('2000-10-26 17:29:16.610000'),
11: Timestamp('2000-10-26 17:29:18.770000'),
12: Timestamp('2000-10-26 17:29:18.770000'),
13: Timestamp('2000-10-26 17:29:28.770000'),
14: Timestamp('2000-10-26 17:29:34.770000'),
15: Timestamp('2000-10-26 17:29:40.770000'),
16: Timestamp('2000-10-26 17:29:46.770000'),
17: Timestamp('2000-10-26 17:32:01.180000'),
18: Timestamp('2000-10-26 17:32:01.180000'),
19: Timestamp('2000-10-26 17:32:35.180000')},
'X': {0: 18,
1: 18,
2: 18,
3: 18,
4: 18,
5: 18,
6: 18,
7: 18,
8: 18,
9: 18,
10: 18,
11: 18,
12: 18,
13: 18,
14: 19,
15: 19,
16: 19,
17: 19,
18: 19,
19: 19},
'Y': {0: 14,
1: 13,
2: 14,
3: 14,
4: 14,
5: 14,
6: 14,
7: 14,
8: 14,
9: 13,
10: 14,
11: 14,
12: 13,
13: 14,
14: 14,
15: 14,
16: 14,
17: 13,
18: 14,
19: 13},
'a': {0: 0.0,
1: 1.0,
2: 1.0,
3: 1.0,
4: 0.0,
5: 0.0,
6: 1.0,
7: 1.0,
8: 1.0,
9: 1.0,
10: 1.0,
11: 0.0,
12: 0.0,
13: 0.0,
14: 0.0,
15: 1.0,
16: 1.0,
17: 1.0,
18: 0.0,
19: 0.0},
'b': {0: 24.5,
1: 24.0,
2: 23.5,
3: 23.5,
4: 23.5,
5: 26.5,
6: 26.5,
7: 53.5,
8: 62.0,
9: 24.0,
10: 64.5,
11: 64.5,
12: 24.0,
13: 63.5,
14: 62.0,
15: 61.0,
16: 60.0,
17: 25.0,
18: 51.5,
19: 50.0},
'Type': {0: 'medium',
1: 'low',
2: 'medium',
3: 'medium',
4: 'medium',
5: 'medium',
6: 'medium',
7: 'medium',
8: 'medium',
9: 'low',
10: 'medium',
11: 'medium',
12: 'low',
13: 'medium',
14: 'medium',
15: 'medium',
16: 'medium',
17: 'low',
18: 'medium',
19: 'medium'}})
预期输出:
Start End X
2000-10-26 10:39:24.310 2000-10-26 17:29:18.770 18
2000-10-26 17:29:40.770 2000-10-26 17:32:35.180 19
如何在 for 循环中执行此操作,以便我可以进一步操作数据框,例如计算Y
活动中每个唯一值的平均值?
编辑:
对任何混淆表示歉意。活动应该通过a
值的变化来确定,并且应该是连续的,即,在一个活动结束后,我们重置并计算下一个活动。因此,例如以下数据帧应生成相同的输出:
Timestamp X Y a b Type
0 2000-10-26 10:08:27.060 18 14 0.0 24.5 medium
1 2000-10-26 10:39:24.310 18 13 1.0 24.0 low
2 2000-10-26 11:50:48.190 18 14 1.0 23.5 medium
3 2000-10-26 17:18:07.610 18 14 1.0 23.5 medium
4 2000-10-26 17:18:09.610 18 14 0.0 23.5 medium
5 2000-10-26 17:29:10.610 18 14 0.0 26.5 medium
6 2000-10-26 17:29:10.770 18 14 1.0 26.5 medium
7 2000-10-26 17:29:12.610 18 14 1.0 53.5 medium
8 2000-10-26 17:29:14.610 18 14 1.0 62.0 medium
9 2000-10-26 17:29:14.770 18 13 1.0 24.0 low
10 2000-10-26 17:29:16.610 18 14 1.0 64.5 medium
11 2000-10-26 17:29:18.770 18 14 0.0 64.5 medium
12 2000-10-26 17:29:18.770 18 13 0.0 24.0 low
13 2000-10-26 17:29:28.770 18 14 0.0 63.5 medium
14 2000-10-26 17:29:34.770 18 14 0.0 62.0 medium
15 2000-10-26 17:29:40.770 18 14 1.0 61.0 medium
16 2000-10-26 17:29:46.770 18 14 1.0 60.0 medium
17 2000-10-26 17:32:01.180 18 13 1.0 25.0 low
18 2000-10-26 17:32:01.180 18 14 0.0 51.5 medium
19 2000-10-26 17:32:35.180 18 13 0.0 50.0 medium
预期输出:
Start End X
2000-10-26 10:39:24.310 2000-10-26 17:29:18.770 18
2000-10-26 17:29:40.770 2000-10-26 17:32:35.180 18
解决方案
这是一个可能的解决方案,问题是end
根据您的约束可以是两个值
# Create dict for the solution
expected_dict = {'start': [], 'end': [], 'X': [], 'b_means': []}
# Loop through each group
for name, gp in df.groupby('X'):
gp['y_shift'] = gp['Y'].shift(-1).ffill().astype(int)
astart = gp[gp['a'].eq(1.0)].iloc[0]['pd.Timestamp']
# End logic that can be updated
aend = gp[gp['a'].eq(0.0) & gp['Y'].ne(gp['y_shift'])].iloc[-1]['pd.Timestamp']
b_means = gp['b'].mean()
# Collect all values
expected_dict ['start'].append(astart)
expected_dict ['end'].append(aend)
expected_dict ['X'].append(name)
expected_dict ['b_means'].append(name)
edf = pd.DataFrame(expected_dict)
start end X b_means
0 2000-10-26 10:39:24.310 2000-10-26 17:29:18.770 18 37.714286
1 2000-10-26 17:29:40.770 2000-10-26 17:32:01.180 19 51.583333
推荐阅读
- excel - 复制数据如果有内容
- node.js - Discord.js:手动触发事件
- r - tmap tm_bubble 大小不正确
- java - javax.jms.JMSException:无法连接到代理 URL:tcp://localhost:61616。原因:java.net.ConnectException:连接被拒绝:连接
- xaml - UWP VisualStateManager PointerOver 不起作用
- laravel-5.8 - Spatie Laravel 权限急切加载
- python - 在 xlwings 中创建和命名表
- excel - 如何将pdf内容复制到word doc然后使用excel vba复制到excel?
- python - 我们怎样才能更快地比较两个不同表之间的数据
- regex - 如何在 Oracle 中获取方括号之间的值