首页 > 解决方案 > 根据日期范围在 df 中创建新行

问题描述

以下是相关 df 的简化版本的脚本:

import pandas as pd
import numpy as np
from datetime import date
from datetime import datetime

df = pd.DataFrame({'date':pd.date_range(start='2020-01-01', end='2020/01/07'), 
                   'id' : range(1,8), 
                   'product': ['list_3','list_1','list_2', 'list_3','list_2','list_1','list_1'],
                  'duration' : [3,1,2,3,2,1,1],
                  'product_in_use': ('true'),
                  'end_date':['2020-01-03','2020-01-02','2020-01-04','2020-01-06','2020-01-06','2020-01-06',
                                '2020-01-07']})

df['duration']= pd.to_timedelta(df['duration'], unit='D')
df['date'] = pd.to_datetime(df['date'])
df['end_date'] = pd.to_datetime(df['end_date'])
df

东风:

    date       id   product duration product_in_use end_date
0   2020-01-01  1   list_3  3 days     true        2020-01-03
1   2020-01-02  2   list_1  1 days     true        2020-01-02
2   2020-01-03  3   list_2  2 days     true        2020-01-04
3   2020-01-04  4   list_3  3 days     true        2020-01-06
4   2020-01-05  5   list_2  2 days     true        2020-01-06
5   2020-01-06  6   list_1  1 days     true        2020-01-06
6   2020-01-07  7   list_1  1 days     true        2020-01-07

正如您在上面的 df 中看到的,每个 id 都在使用一个产品,并且每个产品持续特定的持续时间。产品使用的日期没有行,只有用户购买产品的行。

因此,我想为每个 ID 使用产品的所有日期创建新行。

因此,我的预期 df 将是这样的:

    date       id   product  duration   product_in_use
0   2020-01-01  1   list_3   3 days         true
1   2020-01-02  1   list_3   3 days         true
2   2020-01-03  1   list_3   3 days         true
3   2020-01-02  2   list_1   1 days         true
4   2020-01-03  3   list_2   2 days         true
5   2020-01-04  3   list_2   2 days         true
6   2020-01-04  4   list_3   3 days         true
7   2020-01-05  4   list_3   3 days         true
8   2020-01-06  4   list_3   3 days         true
9   2020-01-05  5   list_2   3 days         true
10  2020-01-06  5   list_2   2 days         true
11  2020-01-06  6   list_1   2 days         true
12  2020-01-07  7   list_1   1 days         true

标签: pythonpandas

解决方案


使用starmapchain为每个 id 创建从开始日期到结束日期的日期范围,按持续时间扩展主数据框,并将新日期分配为数据框的索引。

from itertools import starmap,chain

#create date ranges from date to end_date for each id
start_end = zip(df.date.array,df.end_date.array)
date_ranges = starmap(pd.date_range,start_end)
date_ranges = chain.from_iterable(date_ranges)

#get all columns except date and end_date
res = df.filter(['id','product','duration','product_in_use'])

#expand the dataframe by repeating the indexes based on the duration
#so index 0 will be repeated 3 times, 1 once, 2 twice, ...
res = res.reindex(res.index.repeat(res.duration.dt.days))

#assign the new date_ranges to the dataframe
res.index = date_ranges
res

           id   product duration    product_in_use
2020-01-01  1   list_3    3 days    true
2020-01-02  1   list_3    3 days    true
2020-01-03  1   list_3    3 days    true
2020-01-02  2   list_1    1 days    true
2020-01-03  3   list_2    2 days    true
2020-01-04  3   list_2    2 days    true
2020-01-04  4   list_3    3 days    true
2020-01-05  4   list_3    3 days    true
2020-01-06  4   list_3    3 days    true
2020-01-05  5   list_2    2 days    true
2020-01-06  5   list_2    2 days    true
2020-01-06  6   list_1    1 days    true
2020-01-07  7   list_1    1 days    true

推荐阅读