首页 > 解决方案 > 高效使用numpy select

问题描述

我有一些数据如下。我正在尝试计算Time bw列中的值(第 4 行中应该为 0)。每当Location移动到一个新的,比如ab我希望Time bw从 0 重新开始。我正在尝试使用neselect并且diff()

+----------+---------------------+----------+
| Location |         Date        | Time bw  |
+----------+---------------------+----------+
| a        | 2018-06-26 00:00:00 |        0 |
| a        | 2018-06-26 00:00:00 |        0 |
| a        | 2018-06-26 00:00:00 |        0 |
| b        | 2018-08-03 00:00:00 |       38 |
| b        | 2018-08-03 00:00:00 |        0 |
| b        | 2018-08-04 00:00:00 |        1 |
| b        | 2018-08-04 00:00:00 |        0 |
| b        | 2018-08-04 00:00:00 |        0 |
| b        | 2018-08-04 00:00:00 |        0 |
| b        | 2018-08-04 00:00:00 |        0 |
| b        | 2018-08-04 00:00:00 |        0 |
| b        | 2018-08-05 00:00:00 |        1 |
| b        | 2018-08-08 00:00:00 |        3 |
| b        | 2018-08-08 00:00:00 |        0 |
| b        | 2018-08-08 00:00:00 |        0 |
| b        | 2018-08-08 00:00:00 |        0 |
| b        | 2018-08-08 00:00:00 |        0 |
| c        | 2018-08-14 00:00:00 |        6 |
| c        | 2018-08-14 00:00:00 |        0 |
| c        | 2018-08-14 00:00:00 |        0 |
+----------+---------------------+----------+

标签: pythonpandasselect

解决方案


国际大学联盟:

df['Time bw'] = np.where(df.Location.ne(df.Location.shift()), 0, df['Time bw'])

输出:

    Location    Date    Time bw
0   a   20180626 00:00:00   0
1   a   20180626 00:00:00   0
2   a   20180626 00:00:00   0
3   b   20180803 00:00:00   0
4   b   20180803 00:00:00   0
5   b   20180804 00:00:00   1
6   b   20180804 00:00:00   0
7   b   20180804 00:00:00   0
8   b   20180804 00:00:00   0
9   b   20180804 00:00:00   0
10  b   20180804 00:00:00   0
11  b   20180805 00:00:00   1
12  b   20180808 00:00:00   3
13  b   20180808 00:00:00   0
14  b   20180808 00:00:00   0
15  b   20180808 00:00:00   0
16  b   20180808 00:00:00   0
17  c   20180814 00:00:00   0
18  c   20180814 00:00:00   0
19  c   20180814 00:00:00   0

推荐阅读