首页 > 解决方案 > 如何根据年或周按数据框分组以及如何在python中组合两个数据集

问题描述

Dataset 1 : Sales Representative ID, Customer ID, Order Date, Revenue
Dataset 2 : Manager ID, Sales Representative ID, Create Date, Termination date

给定上述 2 个数据集,其中“数据集 1”表示与客户相关的每日收入数据以及与该客户关联的销售代表,并且“数据集 2”在特定时间点具有销售代表与与其关联的经理 ID 的映射,其中“ “创建日期”表示创建新关联的时间,“终止日期”表示终止关联的时间。

我必须为每个日期的每个经理 ID 计算年、月、周和日的收入。

Output Dataset: Order Date, Year/Month/Week/Day,Manager ID, Total Revenue

我在这里对如何组合这两个数据集的两件事感到困惑,其次是如何明智地获得收入周、年和日,就像我不知道在 pandas 中按照上述方法对它们进行分组一样。请帮忙

dataset1 = { 'srid':[1,2,3,1,5],
            'custid':[11,12,43,12,34],
            'orderdate':["1/2/2019","1/2/2019","2/2/2019","1/2/2019","1/2/2019"],
            'Rev':[100,101,102,103,17]
}


dataset2 = {
    'manid':[101,102,103,104,105],
    'srid':[1,2,1,3,5],
    'CreateDate':["1/1/2019","1/1/2019","3/1/2019","1/1/2019","1/1/2019"],
    'TerminationDate':["2/1/2019","3/1/2019","5/1/2019","2/1/2019","2/1/2019"]
}

标签: pythonpandas

解决方案


尝试这个:

df1 = pd.DataFrame(dataset1)
df2 = pd.DataFrame(dataset2)

df = df1.merge(df2, on=['srid'])
df['orderdate'] = pd.to_datetime(df['orderdate'])
df['CreateDate'] = pd.to_datetime(df['CreateDate'])
df['TerminationDate'] = pd.to_datetime(df['TerminationDate'])

# Daily
df_d = df.groupby(by=['manid', pd.Grouper(key='orderdate', freq='D')]).agg({'Rev': 'sum'})
# Monthly
df_m = df.groupby(by=['manid', pd.Grouper(key='orderdate', freq='M')]).agg({'Rev': 'sum'})
# Weekly
df_w = df.groupby(by=['manid', pd.Grouper(key='orderdate', freq='W')]).agg({'Rev': 'sum'})
# Yearly
df_y = df.groupby(by=['manid', pd.Grouper(key='orderdate', freq='Y')]).agg({'Rev': 'sum'})
print(df_y)

                  Rev
manid orderdate
101   2019-12-31  203
102   2019-12-31  101
103   2019-12-31  203
104   2019-12-31  102
105   2019-12-31   17

推荐阅读