python - 如何在条件 lambda 语句中返回系列的最大值
问题描述
我正在使用以下代码返回 df1 和 df2 中的 NID 和 Fdat 相同且 df2 中的 Tdate 小于 df1 中的 RxDate 的行数。
def breedings(row):
row['TLSCC'] = len(df2[((df2['NID']==row['NID']) & (df2['Fdat']==row['Fdat']) & (df2['Tdate'] < row['RxDate']) )] )
return row
df1 = df1.apply(lambda x:breedings(x),axis=1)
结果放在 df1 的新列 TLSCC 中
我想使用用于确定满足条件的行数的相同条件搜索 df2 中的记录以查找最大 SCC 值。即,我想将这些行中的 max.scc 值返回到 df1 中的新列。
用 max 替换 len 不起作用,因为它没有链接到 SCC。我也尝试将 max 放入 apply lambda 语句中。没有成功。
任何指针表示赞赏。
下面提供了 df1 和 df2 的数据。
df1
{'NID': {1: '212017463',
2: '212017463',
3: '212017463',
4: '213015177',
5: '213015177',
6: '213015177',
7: '213015177',
8: '213015177',
9: '213015177',
10: '213015177'},
'Fdat': {1: Timestamp('2020-08-06 00:00:00'),
2: Timestamp('2020-08-06 00:00:00'),
3: Timestamp('2020-08-06 00:00:00'),
4: Timestamp('2018-08-30 00:00:00'),
5: Timestamp('2018-08-30 00:00:00'),
6: Timestamp('2018-08-30 00:00:00'),
7: Timestamp('2018-08-30 00:00:00'),
8: Timestamp('2018-08-30 00:00:00'),
9: Timestamp('2018-08-30 00:00:00'),
10: Timestamp('2018-08-30 00:00:00')},
'RxDate': {1: Timestamp('2020-09-30 00:00:00'),
2: Timestamp('2020-10-12 00:00:00'),
3: Timestamp('2020-10-15 00:00:00'),
4: Timestamp('2018-12-06 00:00:00'),
5: Timestamp('2018-12-07 00:00:00'),
6: Timestamp('2018-12-09 00:00:00'),
7: Timestamp('2018-12-11 00:00:00'),
8: Timestamp('2018-12-17 00:00:00'),
9: Timestamp('2018-12-18 00:00:00'),
10: Timestamp('2018-12-19 00:00:00')}}
df2
{'NID': {36: '212017463',
37: '212017463',
38: '212017463',
39: '212017463',
40: '212017463',
41: '212017463',
42: '212017463',
43: '212017463',
44: '212017463',
45: '212017463',
46: '212017463',
47: '212017463',
48: '212017463',
49: '212017463',
50: '212017463',
51: '212017463',
52: '212017463',
53: '212017463',
54: '212017463',
55: '212017463',
56: '212017463',
57: '212017463',
58: '212017463',
59: '212017463',
60: '212017463',
61: '212017463',
62: '212017463',
63: '212017463',
64: '212017463',
65: '212017463',
66: '212017463',
67: '212017463',
68: '212017463',
69: '212017463',
70: '213015177',
71: '213015177',
72: '213015177',
73: '213015177',
74: '213015177',
75: '213015177',
76: '213015177',
77: '213015177',
78: '213015177',
79: '213015177',
80: '213015177',
81: '213015177',
82: '213015177',
83: '213015177',
84: '213015177',
85: '213015177',
86: '213015177',
87: '213015177',
88: '213015177',
89: '213015177',
90: '213015177',
91: '213015177',
92: '213015177',
93: '213015177',
94: '213015177',
95: '213015177',
96: '213015177',
97: '213015177',
98: '213015177',
99: '213015177',
100: '213015177',
101: '213015177'},
'Fdat': {36: Timestamp('2014-08-17 00:00:00'),
37: Timestamp('2014-08-17 00:00:00'),
38: Timestamp('2014-08-17 00:00:00'),
39: Timestamp('2014-08-17 00:00:00'),
40: Timestamp('2014-08-17 00:00:00'),
41: Timestamp('2015-09-22 00:00:00'),
42: Timestamp('2015-09-22 00:00:00'),
43: Timestamp('2015-09-22 00:00:00'),
44: Timestamp('2015-09-22 00:00:00'),
45: Timestamp('2015-09-22 00:00:00'),
46: Timestamp('2015-09-22 00:00:00'),
47: Timestamp('2016-09-01 00:00:00'),
48: Timestamp('2016-09-01 00:00:00'),
49: Timestamp('2016-09-01 00:00:00'),
50: Timestamp('2016-09-01 00:00:00'),
51: Timestamp('2016-09-01 00:00:00'),
52: Timestamp('2017-08-31 00:00:00'),
53: Timestamp('2017-08-31 00:00:00'),
54: Timestamp('2017-08-31 00:00:00'),
55: Timestamp('2017-08-31 00:00:00'),
56: Timestamp('2017-08-31 00:00:00'),
57: Timestamp('2017-08-31 00:00:00'),
58: Timestamp('2018-09-17 00:00:00'),
59: Timestamp('2018-09-17 00:00:00'),
60: Timestamp('2018-09-17 00:00:00'),
61: Timestamp('2018-09-17 00:00:00'),
62: Timestamp('2018-09-17 00:00:00'),
63: Timestamp('2019-08-25 00:00:00'),
64: Timestamp('2019-08-25 00:00:00'),
65: Timestamp('2019-08-25 00:00:00'),
66: Timestamp('2019-08-25 00:00:00'),
67: Timestamp('2019-08-25 00:00:00'),
68: Timestamp('2020-08-06 00:00:00'),
69: Timestamp('2020-08-06 00:00:00'),
70: Timestamp('2015-09-01 00:00:00'),
71: Timestamp('2015-09-01 00:00:00'),
72: Timestamp('2015-09-01 00:00:00'),
73: Timestamp('2015-09-01 00:00:00'),
74: Timestamp('2015-09-01 00:00:00'),
75: Timestamp('2015-09-01 00:00:00'),
76: Timestamp('2016-08-18 00:00:00'),
77: Timestamp('2016-08-18 00:00:00'),
78: Timestamp('2016-08-18 00:00:00'),
79: Timestamp('2016-08-18 00:00:00'),
80: Timestamp('2016-08-18 00:00:00'),
81: Timestamp('2017-09-01 00:00:00'),
82: Timestamp('2017-09-01 00:00:00'),
83: Timestamp('2017-09-01 00:00:00'),
84: Timestamp('2017-09-01 00:00:00'),
85: Timestamp('2017-09-01 00:00:00'),
86: Timestamp('2017-09-01 00:00:00'),
87: Timestamp('2018-08-30 00:00:00'),
88: Timestamp('2018-08-30 00:00:00'),
89: Timestamp('2018-08-30 00:00:00'),
90: Timestamp('2018-08-30 00:00:00'),
91: Timestamp('2018-08-30 00:00:00'),
92: Timestamp('2019-08-09 00:00:00'),
93: Timestamp('2019-08-09 00:00:00'),
94: Timestamp('2019-08-09 00:00:00'),
95: Timestamp('2019-08-09 00:00:00'),
96: Timestamp('2019-08-09 00:00:00'),
97: Timestamp('2020-08-06 00:00:00'),
98: Timestamp('2020-08-06 00:00:00'),
99: Timestamp('2020-08-06 00:00:00'),
100: Timestamp('2020-08-06 00:00:00'),
101: Timestamp('2020-08-06 00:00:00')},
'Tdate': {36: Timestamp('2015-04-16 00:00:00'),
37: Timestamp('2015-02-13 00:00:00'),
38: Timestamp('2014-10-10 00:00:00'),
39: Timestamp('2014-12-17 00:00:00'),
40: Timestamp('2014-11-07 00:00:00'),
41: Timestamp('2015-09-29 00:00:00'),
42: Timestamp('2016-05-30 00:00:00'),
43: Timestamp('2016-02-10 00:00:00'),
44: Timestamp('2016-04-12 00:00:00'),
45: Timestamp('2015-11-06 00:00:00'),
46: Timestamp('2015-12-18 00:00:00'),
47: Timestamp('2017-05-31 00:00:00'),
48: Timestamp('2016-11-28 00:00:00'),
49: Timestamp('2017-04-04 00:00:00'),
50: Timestamp('2017-02-15 00:00:00'),
51: Timestamp('2016-10-03 00:00:00'),
52: Timestamp('2018-05-29 00:00:00'),
53: Timestamp('2018-03-20 00:00:00'),
54: Timestamp('2018-04-24 00:00:00'),
55: Timestamp('2018-02-05 00:00:00'),
56: Timestamp('2017-10-03 00:00:00'),
57: Timestamp('2017-11-28 00:00:00'),
58: Timestamp('2019-06-03 00:00:00'),
59: Timestamp('2019-04-10 00:00:00'),
60: Timestamp('2019-01-14 00:00:00'),
61: Timestamp('2018-10-02 00:00:00'),
62: Timestamp('2018-11-14 00:00:00'),
63: Timestamp('2020-03-24 00:00:00'),
64: Timestamp('2020-05-21 00:00:00'),
65: Timestamp('2019-11-14 00:00:00'),
66: Timestamp('2020-01-21 00:00:00'),
67: Timestamp('2019-09-27 00:00:00'),
68: Timestamp('2020-09-30 00:00:00'),
69: Timestamp('2020-12-16 00:00:00'),
70: Timestamp('2015-09-29 00:00:00'),
71: Timestamp('2016-04-12 00:00:00'),
72: Timestamp('2016-05-30 00:00:00'),
73: Timestamp('2016-02-10 00:00:00'),
74: Timestamp('2015-11-06 00:00:00'),
75: Timestamp('2015-12-18 00:00:00'),
76: Timestamp('2017-04-04 00:00:00'),
77: Timestamp('2017-05-31 00:00:00'),
78: Timestamp('2017-02-15 00:00:00'),
79: Timestamp('2016-11-28 00:00:00'),
80: Timestamp('2016-10-03 00:00:00'),
81: Timestamp('2018-05-29 00:00:00'),
82: Timestamp('2018-03-20 00:00:00'),
83: Timestamp('2018-04-24 00:00:00'),
84: Timestamp('2017-10-03 00:00:00'),
85: Timestamp('2018-02-05 00:00:00'),
86: Timestamp('2017-11-28 00:00:00'),
87: Timestamp('2019-04-10 00:00:00'),
88: Timestamp('2019-06-03 00:00:00'),
89: Timestamp('2019-01-14 00:00:00'),
90: Timestamp('2018-11-14 00:00:00'),
91: Timestamp('2018-10-02 00:00:00'),
92: Timestamp('2020-05-21 00:00:00'),
93: Timestamp('2020-03-24 00:00:00'),
94: Timestamp('2019-11-14 00:00:00'),
95: Timestamp('2020-01-21 00:00:00'),
96: Timestamp('2019-09-27 00:00:00'),
97: Timestamp('2021-05-13 00:00:00'),
98: Timestamp('2021-01-27 00:00:00'),
99: Timestamp('2021-03-18 00:00:00'),
100: Timestamp('2020-09-30 00:00:00'),
101: Timestamp('2020-12-16 00:00:00')},
'SCC': {36: 26,
37: 853,
38: 66,
39: 39,
40: 18,
41: 1704,
42: 16,
43: 13,
44: 23,
45: 51,
46: 15,
47: 95,
48: 21,
49: 51,
50: 57,
51: 65,
52: 55,
53: 77,
54: 53,
55: 65,
56: 17,
57: 51,
58: 113,
59: 57,
60: 58,
61: 520,
62: 735,
63: 50,
64: 98,
65: 12,
66: 109,
67: 19,
68: 6939,
69: 1891,
70: 166,
71: 15,
72: 25,
73: 14,
74: 34,
75: 24,
76: 45,
77: 74,
78: 17,
79: 3,
80: 7,
81: 32,
82: 27,
83: 122,
84: 4,
85: 62,
86: 7,
87: 35,
88: 74,
89: 117,
90: 110,
91: 2,
92: 55,
93: 316,
94: 17,
95: 19,
96: 8,
97: 77,
98: 12,
99: 50,
100: 16,
101: 19}}
解决方案
我将首先加入,然后在日期上应用条件以生成与您的条件相匹配的连接表:
>>> join = df1.merge(df2, on=['NID', 'Fdat'])
>>> join = join[join['Tdate'] < join['RxDate']]
>>> join
NID Fdat RxDate Tdate SCC
2 212017463 2020-08-06 2020-10-12 2020-09-30 6939
4 212017463 2020-08-06 2020-10-15 2020-09-30 6939
9 213015177 2018-08-30 2018-12-06 2018-11-14 110
10 213015177 2018-08-30 2018-12-06 2018-10-02 2
14 213015177 2018-08-30 2018-12-07 2018-11-14 110
15 213015177 2018-08-30 2018-12-07 2018-10-02 2
19 213015177 2018-08-30 2018-12-09 2018-11-14 110
20 213015177 2018-08-30 2018-12-09 2018-10-02 2
24 213015177 2018-08-30 2018-12-11 2018-11-14 110
25 213015177 2018-08-30 2018-12-11 2018-10-02 2
29 213015177 2018-08-30 2018-12-17 2018-11-14 110
30 213015177 2018-08-30 2018-12-17 2018-10-02 2
34 213015177 2018-08-30 2018-12-18 2018-11-14 110
35 213015177 2018-08-30 2018-12-18 2018-10-02 2
39 213015177 2018-08-30 2018-12-19 2018-11-14 110
40 213015177 2018-08-30 2018-12-19 2018-10-02 2
然后该表上的 groupby 为您提供所需的一切,记录数.size()
和最大值['SCC'].max()
:
>>> join.groupby(df1.columns.to_list()).size().rename('TLSCC')
NID Fdat RxDate
212017463 2020-08-06 2020-10-12 1
2020-10-15 1
213015177 2018-08-30 2018-12-06 2
2018-12-07 2
2018-12-09 2
2018-12-11 2
2018-12-17 2
2018-12-18 2
2018-12-19 2
Name: TLSCC, dtype: int64
>>> join.groupby(df1.columns.to_list())['SCC'].max()
NID Fdat RxDate
212017463 2020-08-06 2020-10-12 6939
2020-10-15 6939
213015177 2018-08-30 2018-12-06 110
2018-12-07 110
2018-12-09 110
2018-12-11 110
2018-12-17 110
2018-12-18 110
2018-12-19 110
然后,您可以使用以下命令将所有内容放回 df1 中.merge
:
>>> group = join.groupby(df1.columns.to_list())
>>> df1 = df1.merge(group.size().rename('TLSCC').reset_index())\
... .merge(group['SCC'].max().rename('max.scc').reset_index())
...
>>> df1
NID Fdat RxDate TLSCC max.scc
0 212017463 2020-08-06 2020-10-12 1 6939
1 212017463 2020-08-06 2020-10-15 1 6939
2 213015177 2018-08-30 2018-12-06 2 110
3 213015177 2018-08-30 2018-12-07 2 110
4 213015177 2018-08-30 2018-12-09 2 110
5 213015177 2018-08-30 2018-12-11 2 110
6 213015177 2018-08-30 2018-12-17 2 110
7 213015177 2018-08-30 2018-12-18 2 110
8 213015177 2018-08-30 2018-12-19 2 110
推荐阅读
- python - Python中的字符到Unicode
- regex - 常规匹配零或一
- javascript - 将源路径数组转换为扩展中 Promise.all 的获取承诺数组
- firebase - Flutter 应用因 dart.html 而停止工作
- mysql - 如何在 Laravel 刀片表中添加空行
- c - 将可执行文件复制到内存后,动态链接器会修改引用吗?
- excel - 如果单元格包含多个特定文本,则复制并粘贴。怎么做?
- google-chrome - 按钮单击在 Chrome 中的 iMacro 上不起作用
- jquery - 如何从jquery中共享同一类的元素集中定位特定元素?
- pdf - 如果 PDF 文件是第一代文档,我们如何检查 uipath?