首页 > 解决方案 > 在下一个`score`值之前的任何行中查找下一个`resit`值的索引行

问题描述

resit在下一个值之前的任何行中查找下一个值的索引行score

ts = [
        datetime.strptime('2016-06-19 22:01:22.229', '%Y-%m-%d %H:%M:%S.%f'),
        datetime.strptime('2016-06-19 23:32:08.109', '%Y-%m-%d %H:%M:%S.%f'),
        datetime.strptime('2016-06-20 02:50:22.181', '%Y-%m-%d %H:%M:%S.%f'),
        datetime.strptime('2016-06-20 06:12:44.249', '%Y-%m-%d %H:%M:%S.%f'),
        datetime.strptime('2016-06-20 19:27:22.129', '%Y-%m-%d %H:%M:%S.%f'),
        datetime.strptime('2016-06-21 11:39:08.119', '%Y-%m-%d %H:%M:%S.%f'),
        datetime.strptime('2016-06-22 23:32:08.109', '%Y-%m-%d %H:%M:%S.%f'),
        datetime.strptime('2016-06-23 02:50:22.181', '%Y-%m-%d %H:%M:%S.%f'),
        datetime.strptime('2016-06-23 06:12:44.249', '%Y-%m-%d %H:%M:%S.%f'),
        datetime.strptime('2016-06-23 19:27:22.129', '%Y-%m-%d %H:%M:%S.%f'),
        datetime.strptime('2016-06-24 11:39:08.119', '%Y-%m-%d %H:%M:%S.%f'),
        datetime.strptime('2016-06-24 16:59:22.610', '%Y-%m-%d %H:%M:%S.%f')
        ]

score = [ np.nan, 12, np.nan, np.nan, np.nan, np.nan, 11, np.nan, np.nan, 12, np.nan, 14]
resit = [ np.nan, np.nan, np.nan, 16, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,  7, np.nan]


df = pd.DataFrame(data={'date': ts, 'jack': score, 'jack2': resit})

这看起来是这样的......

                      date  jack  jack2   resit_score
0  2016-06-19 22:01:22.229   NaN    NaN           NaN
1  2016-06-19 23:32:08.109  12.0    NaN          16.0
2  2016-06-20 02:50:22.181   NaN    NaN           NaN
3  2016-06-20 06:12:44.249   NaN   16.0           NaN
4  2016-06-20 19:27:22.129   NaN    NaN           NaN
5  2016-06-21 11:39:08.119   NaN    NaN           NaN
6  2016-06-22 23:32:08.109  11.0    NaN           NaN
7  2016-06-23 02:50:22.181   NaN    NaN           NaN
8  2016-06-23 06:12:44.249   NaN    NaN           NaN
9  2016-06-23 19:27:22.129  12.0    NaN           7.0
10 2016-06-24 11:39:08.119   NaN    7.0           NaN
11 2016-06-24 16:59:22.610  14.0    NaN           NaN

在excel中我会使用索引匹配但在python中还不确定如何实现

标签: pythonpandasindexing

解决方案


删除所有行NaN并仅移回jack2值:

df['resit_score'] = df[['jack', 'jack2']].dropna(how='all')['jack2'].shift(-1)
>>> df
                      date  jack  jack2  resit_score
0  2016-06-19 22:01:22.229   NaN    NaN          NaN
1  2016-06-19 23:32:08.109  12.0    NaN         16.0
2  2016-06-20 02:50:22.181   NaN    NaN          NaN
3  2016-06-20 06:12:44.249   NaN   16.0          NaN
4  2016-06-20 19:27:22.129   NaN    NaN          NaN
5  2016-06-21 11:39:08.119   NaN    NaN          NaN
6  2016-06-22 23:32:08.109  11.0    NaN          NaN
7  2016-06-23 02:50:22.181   NaN    NaN          NaN
8  2016-06-23 06:12:44.249   NaN    NaN          NaN
9  2016-06-23 19:27:22.129  12.0    NaN          7.0
10 2016-06-24 11:39:08.119   NaN    7.0          NaN
11 2016-06-24 16:59:22.610  14.0    NaN          NaN

更新

返回行索引会更有用。这个怎么切换?

>>> df['resit_score'] = df.assign(jack2=df['jack2'].dropna().index.to_frame()) \
         [['jack', 'jack2']].dropna(how='all')['jack2'].shift(-1)
>>> df
                      date  jack  jack2  resit_score
0  2016-06-19 22:01:22.229   NaN    NaN          NaN
1  2016-06-19 23:32:08.109  12.0    NaN          3.0
2  2016-06-20 02:50:22.181   NaN    NaN          NaN
3  2016-06-20 06:12:44.249   NaN   16.0          NaN
4  2016-06-20 19:27:22.129   NaN    NaN          NaN
5  2016-06-21 11:39:08.119   NaN    NaN          NaN
6  2016-06-22 23:32:08.109  11.0    NaN          NaN
7  2016-06-23 02:50:22.181   NaN    NaN          NaN
8  2016-06-23 06:12:44.249   NaN    NaN          NaN
9  2016-06-23 19:27:22.129  12.0    NaN         10.0
10 2016-06-24 11:39:08.119   NaN    7.0          NaN
11 2016-06-24 16:59:22.610  14.0    NaN          NaN

推荐阅读