首页 > 解决方案 > 如何在条件 lambda 语句中返回系列的最大值

问题描述

我正在使用以下代码返回 df1 和 df2 中的 NID 和 Fdat 相同且 df2 中的 Tdate 小于 df1 中的 RxDate 的行数。

def breedings(row):
    
    row['TLSCC'] = len(df2[((df2['NID']==row['NID']) & (df2['Fdat']==row['Fdat']) & (df2['Tdate'] < row['RxDate']) )] )
     
    return row

df1 = df1.apply(lambda x:breedings(x),axis=1)

结果放在 df1 的新列 TLSCC 中

我想使用用于确定满足条件的行数的相同条件搜索 df2 中的记录以查找最大 SCC 值。即,我想将这些行中的 max.scc 值返回到 df1 中的新列。

用 max 替换 len 不起作用,因为它没有链接到 SCC。我也尝试将 max 放入 apply lambda 语句中。没有成功。

任何指针表示赞赏。

下面提供了 df1 和 df2 的数据。

df1

{'NID': {1: '212017463',
  2: '212017463',
  3: '212017463',
  4: '213015177',
  5: '213015177',
  6: '213015177',
  7: '213015177',
  8: '213015177',
  9: '213015177',
  10: '213015177'},
 'Fdat': {1: Timestamp('2020-08-06 00:00:00'),
  2: Timestamp('2020-08-06 00:00:00'),
  3: Timestamp('2020-08-06 00:00:00'),
  4: Timestamp('2018-08-30 00:00:00'),
  5: Timestamp('2018-08-30 00:00:00'),
  6: Timestamp('2018-08-30 00:00:00'),
  7: Timestamp('2018-08-30 00:00:00'),
  8: Timestamp('2018-08-30 00:00:00'),
  9: Timestamp('2018-08-30 00:00:00'),
  10: Timestamp('2018-08-30 00:00:00')},
 'RxDate': {1: Timestamp('2020-09-30 00:00:00'),
  2: Timestamp('2020-10-12 00:00:00'),
  3: Timestamp('2020-10-15 00:00:00'),
  4: Timestamp('2018-12-06 00:00:00'),
  5: Timestamp('2018-12-07 00:00:00'),
  6: Timestamp('2018-12-09 00:00:00'),
  7: Timestamp('2018-12-11 00:00:00'),
  8: Timestamp('2018-12-17 00:00:00'),
  9: Timestamp('2018-12-18 00:00:00'),
  10: Timestamp('2018-12-19 00:00:00')}}

df2

{'NID': {36: '212017463',
  37: '212017463',
  38: '212017463',
  39: '212017463',
  40: '212017463',
  41: '212017463',
  42: '212017463',
  43: '212017463',
  44: '212017463',
  45: '212017463',
  46: '212017463',
  47: '212017463',
  48: '212017463',
  49: '212017463',
  50: '212017463',
  51: '212017463',
  52: '212017463',
  53: '212017463',
  54: '212017463',
  55: '212017463',
  56: '212017463',
  57: '212017463',
  58: '212017463',
  59: '212017463',
  60: '212017463',
  61: '212017463',
  62: '212017463',
  63: '212017463',
  64: '212017463',
  65: '212017463',
  66: '212017463',
  67: '212017463',
  68: '212017463',
  69: '212017463',
  70: '213015177',
  71: '213015177',
  72: '213015177',
  73: '213015177',
  74: '213015177',
  75: '213015177',
  76: '213015177',
  77: '213015177',
  78: '213015177',
  79: '213015177',
  80: '213015177',
  81: '213015177',
  82: '213015177',
  83: '213015177',
  84: '213015177',
  85: '213015177',
  86: '213015177',
  87: '213015177',
  88: '213015177',
  89: '213015177',
  90: '213015177',
  91: '213015177',
  92: '213015177',
  93: '213015177',
  94: '213015177',
  95: '213015177',
  96: '213015177',
  97: '213015177',
  98: '213015177',
  99: '213015177',
  100: '213015177',
  101: '213015177'},
 'Fdat': {36: Timestamp('2014-08-17 00:00:00'),
  37: Timestamp('2014-08-17 00:00:00'),
  38: Timestamp('2014-08-17 00:00:00'),
  39: Timestamp('2014-08-17 00:00:00'),
  40: Timestamp('2014-08-17 00:00:00'),
  41: Timestamp('2015-09-22 00:00:00'),
  42: Timestamp('2015-09-22 00:00:00'),
  43: Timestamp('2015-09-22 00:00:00'),
  44: Timestamp('2015-09-22 00:00:00'),
  45: Timestamp('2015-09-22 00:00:00'),
  46: Timestamp('2015-09-22 00:00:00'),
  47: Timestamp('2016-09-01 00:00:00'),
  48: Timestamp('2016-09-01 00:00:00'),
  49: Timestamp('2016-09-01 00:00:00'),
  50: Timestamp('2016-09-01 00:00:00'),
  51: Timestamp('2016-09-01 00:00:00'),
  52: Timestamp('2017-08-31 00:00:00'),
  53: Timestamp('2017-08-31 00:00:00'),
  54: Timestamp('2017-08-31 00:00:00'),
  55: Timestamp('2017-08-31 00:00:00'),
  56: Timestamp('2017-08-31 00:00:00'),
  57: Timestamp('2017-08-31 00:00:00'),
  58: Timestamp('2018-09-17 00:00:00'),
  59: Timestamp('2018-09-17 00:00:00'),
  60: Timestamp('2018-09-17 00:00:00'),
  61: Timestamp('2018-09-17 00:00:00'),
  62: Timestamp('2018-09-17 00:00:00'),
  63: Timestamp('2019-08-25 00:00:00'),
  64: Timestamp('2019-08-25 00:00:00'),
  65: Timestamp('2019-08-25 00:00:00'),
  66: Timestamp('2019-08-25 00:00:00'),
  67: Timestamp('2019-08-25 00:00:00'),
  68: Timestamp('2020-08-06 00:00:00'),
  69: Timestamp('2020-08-06 00:00:00'),
  70: Timestamp('2015-09-01 00:00:00'),
  71: Timestamp('2015-09-01 00:00:00'),
  72: Timestamp('2015-09-01 00:00:00'),
  73: Timestamp('2015-09-01 00:00:00'),
  74: Timestamp('2015-09-01 00:00:00'),
  75: Timestamp('2015-09-01 00:00:00'),
  76: Timestamp('2016-08-18 00:00:00'),
  77: Timestamp('2016-08-18 00:00:00'),
  78: Timestamp('2016-08-18 00:00:00'),
  79: Timestamp('2016-08-18 00:00:00'),
  80: Timestamp('2016-08-18 00:00:00'),
  81: Timestamp('2017-09-01 00:00:00'),
  82: Timestamp('2017-09-01 00:00:00'),
  83: Timestamp('2017-09-01 00:00:00'),
  84: Timestamp('2017-09-01 00:00:00'),
  85: Timestamp('2017-09-01 00:00:00'),
  86: Timestamp('2017-09-01 00:00:00'),
  87: Timestamp('2018-08-30 00:00:00'),
  88: Timestamp('2018-08-30 00:00:00'),
  89: Timestamp('2018-08-30 00:00:00'),
  90: Timestamp('2018-08-30 00:00:00'),
  91: Timestamp('2018-08-30 00:00:00'),
  92: Timestamp('2019-08-09 00:00:00'),
  93: Timestamp('2019-08-09 00:00:00'),
  94: Timestamp('2019-08-09 00:00:00'),
  95: Timestamp('2019-08-09 00:00:00'),
  96: Timestamp('2019-08-09 00:00:00'),
  97: Timestamp('2020-08-06 00:00:00'),
  98: Timestamp('2020-08-06 00:00:00'),
  99: Timestamp('2020-08-06 00:00:00'),
  100: Timestamp('2020-08-06 00:00:00'),
  101: Timestamp('2020-08-06 00:00:00')},
 'Tdate': {36: Timestamp('2015-04-16 00:00:00'),
  37: Timestamp('2015-02-13 00:00:00'),
  38: Timestamp('2014-10-10 00:00:00'),
  39: Timestamp('2014-12-17 00:00:00'),
  40: Timestamp('2014-11-07 00:00:00'),
  41: Timestamp('2015-09-29 00:00:00'),
  42: Timestamp('2016-05-30 00:00:00'),
  43: Timestamp('2016-02-10 00:00:00'),
  44: Timestamp('2016-04-12 00:00:00'),
  45: Timestamp('2015-11-06 00:00:00'),
  46: Timestamp('2015-12-18 00:00:00'),
  47: Timestamp('2017-05-31 00:00:00'),
  48: Timestamp('2016-11-28 00:00:00'),
  49: Timestamp('2017-04-04 00:00:00'),
  50: Timestamp('2017-02-15 00:00:00'),
  51: Timestamp('2016-10-03 00:00:00'),
  52: Timestamp('2018-05-29 00:00:00'),
  53: Timestamp('2018-03-20 00:00:00'),
  54: Timestamp('2018-04-24 00:00:00'),
  55: Timestamp('2018-02-05 00:00:00'),
  56: Timestamp('2017-10-03 00:00:00'),
  57: Timestamp('2017-11-28 00:00:00'),
  58: Timestamp('2019-06-03 00:00:00'),
  59: Timestamp('2019-04-10 00:00:00'),
  60: Timestamp('2019-01-14 00:00:00'),
  61: Timestamp('2018-10-02 00:00:00'),
  62: Timestamp('2018-11-14 00:00:00'),
  63: Timestamp('2020-03-24 00:00:00'),
  64: Timestamp('2020-05-21 00:00:00'),
  65: Timestamp('2019-11-14 00:00:00'),
  66: Timestamp('2020-01-21 00:00:00'),
  67: Timestamp('2019-09-27 00:00:00'),
  68: Timestamp('2020-09-30 00:00:00'),
  69: Timestamp('2020-12-16 00:00:00'),
  70: Timestamp('2015-09-29 00:00:00'),
  71: Timestamp('2016-04-12 00:00:00'),
  72: Timestamp('2016-05-30 00:00:00'),
  73: Timestamp('2016-02-10 00:00:00'),
  74: Timestamp('2015-11-06 00:00:00'),
  75: Timestamp('2015-12-18 00:00:00'),
  76: Timestamp('2017-04-04 00:00:00'),
  77: Timestamp('2017-05-31 00:00:00'),
  78: Timestamp('2017-02-15 00:00:00'),
  79: Timestamp('2016-11-28 00:00:00'),
  80: Timestamp('2016-10-03 00:00:00'),
  81: Timestamp('2018-05-29 00:00:00'),
  82: Timestamp('2018-03-20 00:00:00'),
  83: Timestamp('2018-04-24 00:00:00'),
  84: Timestamp('2017-10-03 00:00:00'),
  85: Timestamp('2018-02-05 00:00:00'),
  86: Timestamp('2017-11-28 00:00:00'),
  87: Timestamp('2019-04-10 00:00:00'),
  88: Timestamp('2019-06-03 00:00:00'),
  89: Timestamp('2019-01-14 00:00:00'),
  90: Timestamp('2018-11-14 00:00:00'),
  91: Timestamp('2018-10-02 00:00:00'),
  92: Timestamp('2020-05-21 00:00:00'),
  93: Timestamp('2020-03-24 00:00:00'),
  94: Timestamp('2019-11-14 00:00:00'),
  95: Timestamp('2020-01-21 00:00:00'),
  96: Timestamp('2019-09-27 00:00:00'),
  97: Timestamp('2021-05-13 00:00:00'),
  98: Timestamp('2021-01-27 00:00:00'),
  99: Timestamp('2021-03-18 00:00:00'),
  100: Timestamp('2020-09-30 00:00:00'),
  101: Timestamp('2020-12-16 00:00:00')},
 'SCC': {36: 26,
  37: 853,
  38: 66,
  39: 39,
  40: 18,
  41: 1704,
  42: 16,
  43: 13,
  44: 23,
  45: 51,
  46: 15,
  47: 95,
  48: 21,
  49: 51,
  50: 57,
  51: 65,
  52: 55,
  53: 77,
  54: 53,
  55: 65,
  56: 17,
  57: 51,
  58: 113,
  59: 57,
  60: 58,
  61: 520,
  62: 735,
  63: 50,
  64: 98,
  65: 12,
  66: 109,
  67: 19,
  68: 6939,
  69: 1891,
  70: 166,
  71: 15,
  72: 25,
  73: 14,
  74: 34,
  75: 24,
  76: 45,
  77: 74,
  78: 17,
  79: 3,
  80: 7,
  81: 32,
  82: 27,
  83: 122,
  84: 4,
  85: 62,
  86: 7,
  87: 35,
  88: 74,
  89: 117,
  90: 110,
  91: 2,
  92: 55,
  93: 316,
  94: 17,
  95: 19,
  96: 8,
  97: 77,
  98: 12,
  99: 50,
  100: 16,
  101: 19}}

标签: pythonpandaslambdamaxapply

解决方案


我将首先加入,然后在日期上应用条件以生成与您的条件相匹配的连接表:

>>> join = df1.merge(df2, on=['NID', 'Fdat'])
>>> join = join[join['Tdate'] < join['RxDate']]
>>> join
          NID       Fdat     RxDate      Tdate   SCC
2   212017463 2020-08-06 2020-10-12 2020-09-30  6939
4   212017463 2020-08-06 2020-10-15 2020-09-30  6939
9   213015177 2018-08-30 2018-12-06 2018-11-14   110
10  213015177 2018-08-30 2018-12-06 2018-10-02     2
14  213015177 2018-08-30 2018-12-07 2018-11-14   110
15  213015177 2018-08-30 2018-12-07 2018-10-02     2
19  213015177 2018-08-30 2018-12-09 2018-11-14   110
20  213015177 2018-08-30 2018-12-09 2018-10-02     2
24  213015177 2018-08-30 2018-12-11 2018-11-14   110
25  213015177 2018-08-30 2018-12-11 2018-10-02     2
29  213015177 2018-08-30 2018-12-17 2018-11-14   110
30  213015177 2018-08-30 2018-12-17 2018-10-02     2
34  213015177 2018-08-30 2018-12-18 2018-11-14   110
35  213015177 2018-08-30 2018-12-18 2018-10-02     2
39  213015177 2018-08-30 2018-12-19 2018-11-14   110
40  213015177 2018-08-30 2018-12-19 2018-10-02     2

然后该表上的 groupby 为您提供所需的一切,记录数.size()和最大值['SCC'].max()

>>> join.groupby(df1.columns.to_list()).size().rename('TLSCC')
NID        Fdat        RxDate    
212017463  2020-08-06  2020-10-12    1
                       2020-10-15    1
213015177  2018-08-30  2018-12-06    2
                       2018-12-07    2
                       2018-12-09    2
                       2018-12-11    2
                       2018-12-17    2
                       2018-12-18    2
                       2018-12-19    2
Name: TLSCC, dtype: int64
>>> join.groupby(df1.columns.to_list())['SCC'].max()
NID        Fdat        RxDate    
212017463  2020-08-06  2020-10-12    6939
                       2020-10-15    6939
213015177  2018-08-30  2018-12-06     110
                       2018-12-07     110
                       2018-12-09     110
                       2018-12-11     110
                       2018-12-17     110
                       2018-12-18     110
                       2018-12-19     110

然后,您可以使用以下命令将所有内容放回 df1 中.merge

>>> group = join.groupby(df1.columns.to_list())
>>> df1 = df1.merge(group.size().rename('TLSCC').reset_index())\
...          .merge(group['SCC'].max().rename('max.scc').reset_index())
...
>>> df1
         NID       Fdat     RxDate  TLSCC  max.scc
0  212017463 2020-08-06 2020-10-12      1     6939
1  212017463 2020-08-06 2020-10-15      1     6939
2  213015177 2018-08-30 2018-12-06      2      110
3  213015177 2018-08-30 2018-12-07      2      110
4  213015177 2018-08-30 2018-12-09      2      110
5  213015177 2018-08-30 2018-12-11      2      110
6  213015177 2018-08-30 2018-12-17      2      110
7  213015177 2018-08-30 2018-12-18      2      110
8  213015177 2018-08-30 2018-12-19      2      110

推荐阅读