首页 > 解决方案 > 使用列条件处理数据框的子集

问题描述

从数据框中,df我想在按升序对列进行排序后更新Points另一列的前 3 个值的列值,这样TimeTime

df['Points'] = df['Points'] * 1.3第一行(最小Time

df['Points'] = df['Points'] * 1.2对于第二行(第二小Time

df['Points'] = df['Points'] * 1.1第三行(第三小Time)四舍五入到最接近的整数。

并且Points对于所有其他行保持不变。

我必须为第三列 value 的每个唯一值执行此操作Challenge。我怎样才能做到这一点?

所以,我需要PointsA而不是Points从下面 -

Challenge      Team              Time              Points   PointsA 
   A             1    2019-11-05 23:00:43.07589     200       260
   B             3    2019-11-05 22:10:55.07589     100       130
   A             5    2019-11-05 23:05:43.07589     200       240
   A             7    2019-11-05 23:07:33.07589     200       220
   B            10    2019-11-05 22:20:13.07589     100       120
   C             4    2019-11-06 00:05:22.07589      50        65
   A             4    2019-11-05 23:18:23.07589     200       200

我试过类似的东西 -

for challenge in df['Challenge'].unique():
     df[df['Challenge'] == challenge].sort_values('Time', ascending=True).head(1)['Points'] *= 1.3

但这似乎不起作用。

标签: pythonpandasdataframe

解决方案


试试这个。使用value_countsanditems来获取它们的每个challenge长度。使用这些长度来缩小挑战的分配范围

val = [1.3, 1.2, 1.1]
df.Time = pd.to_datetime(df.Time)
for challenge, i in df['Challenge'].value_counts().items():
    df.loc[df[df['Challenge'] == challenge].nsmallest(3, 'Time').index, 'Points'] *= val[:i]

Out[201]:
  Challenge  Team                       Time  Points  PointsA
0         A     1 2019-11-05 23:00:43.075890   260.0       260
1         B     3 2019-11-05 22:10:55.075890   130.0       130
2         A     5 2019-11-05 23:05:43.075890   240.0       240
3         A     7 2019-11-05 23:07:33.075890   220.0       220
4         B    10 2019-11-05 22:20:13.075890   120.0       120
5         C     4 2019-11-06 00:05:22.075890    65.0        65
6         A     4 2019-11-05 23:18:23.075890   200.0       200

就像Challenge = 'C'一行一样,它从50到正确计算65


推荐阅读