python - Get math value from two consecutive row
问题描述
Here is the dataframe I have
import pandas as pd
import datetime
data = [['A1','String01',45,datetime.date(2018,1,1),datetime.date(2018,3,1)],
['A1','String02',46,datetime.date(2018,3,1),datetime.date(2018,4,29)],
['A1','String03',48,datetime.date(2018,4,29),datetime.date(2018,6,30)],
['A1','String04',51,datetime.date(2018,6,30),datetime.date(2018,12,31)],
['A2','String11',32,datetime.date(2018,1,1),datetime.date(2018,6,1)],
['A2','String12',33,datetime.date(2018,6,1),datetime.date(2018,7,30)],
['A2','String13',54,datetime.date(2018,8,11),datetime.date(2018,12,31)],
['A3','String21',45,datetime.date(2018,1,1),datetime.date(2018,6,1)],
['A3','String22',47,datetime.date(2018,7,1),datetime.date(2018,12,31)],]
cols = ['ID','SomeValue','Price','StartDate','EndDate']
df = pd.DataFrame(data,columns=cols)
print(df)
If we printed the dataframe, we can see Price for ID=A2 is missing from 7/31 to 8/11 (looking at the StartDate and EndDate). We have a similar situation with ID=A3
What I would want to do it, find out StartDate - EndDate (of previous columns) grouped by ID.
My output should be something like:
ID SomeValue Price StartDate EndDate NoOfDaysMissing
0 A1 String01 45 2018-01-01 2018-03-01 NaN
1 A1 String02 46 2018-03-01 2018-04-29 0.0
2 A1 String03 48 2018-04-29 2018-06-30 0.0
3 A1 String04 51 2018-06-30 2018-12-31 0.0
4 A2 String11 32 2018-01-01 2018-06-01 NaN
5 A2 String12 33 2018-06-01 2018-07-30 0.0
6 A2 String13 54 2018-08-11 2018-12-31 12.0
7 A3 String21 45 2018-01-01 2018-06-01 NaN
8 A3 String22 47 2018-07-01 2018-12-31 30.0
where NoOfDays missing is calculated by StartDate - EndDate(of previous row) for each ID (grouped by each ID)
解决方案
使用,shift
从上一行获取 EndDate,取差,然后在 a 中使用dt
带days
属性的访问器groupby
:
df[['StartDate','EndDate']] = df[['StartDate','EndDate']].apply(pd.to_datetime)
df['NoOfDaysMissing'] = df.groupby('ID', group_keys=False)\
.apply(lambda x: (x['StartDate'] - x['EndDate'].shift()).dt.days)
df
输出:
ID SomeValue Price StartDate EndDate NoOfDaysMissing
0 A1 String01 45 2018-01-01 2018-03-01 NaN
1 A1 String02 46 2018-03-01 2018-04-29 0.0
2 A1 String03 48 2018-04-29 2018-06-30 0.0
3 A1 String04 51 2018-06-30 2018-12-31 0.0
4 A2 String11 32 2018-01-01 2018-06-01 NaN
5 A2 String12 33 2018-06-01 2018-07-30 0.0
6 A2 String13 54 2018-08-11 2018-12-31 12.0
7 A3 String21 45 2018-01-01 2018-06-01 NaN
8 A3 String22 47 2018-07-01 2018-12-31 30.0
推荐阅读
- batch-file - 批处理编程:自给自足的自毁文件
- javascript - 如何从 Javascript 代码中设置 Landbot 变量?
- c# - 如何使用 Unity 广告统一放置广告横幅?没有插件下载
- regex - 正则表达式:在同一行中多次找到相同的模式
- sql - 如何使用基于范围的子查询?
- java - IntelliJ 终端 - 已更新且不再能够从终端运行脚本
- html - 如何覆盖已在其类中具有 !important 的引导类
- java - @Slf4j 找不到符号 LOG
- pine-script - 收益线,从输入值绘制一条线
- python - pyodbc.Error: ('07002', '[07002] [Microsoft][ODBC Microsoft Access Driver] 参数太少。预期为 9. (-3010) (SQLExecDirectW)')