python - 如何填充开始日期为每月第一天的缺失值?
问题描述
我有这样的数据框:
tst=
Date % on Merchant % on Customer Merchants Location
2021-08-04 0.0 0.10 Zwarma - The Shawarma Maker Palani
2021-08-05 0.0 0.10 Zwarma - The Shawarma Maker Palani
2021-08-06 0.0 0.10 Zwarma - The Shawarma Maker Palani
2021-08-01 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-02 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-03 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-04 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-05 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-06 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
uni_ind= ['% on Merchant','% on Customer','Merchants','Location']
我正在寻找输出:
Date % on Merchant % on Customer Merchants Location
2021-08-01 0.0 0.10 Zwarma - The Shawarma Maker Palani
2021-08-02 0.0 0.10 Zwarma - The Shawarma Maker Palani
2021-08-03 0.0 0.10 Zwarma - The Shawarma Maker Palani
2021-08-04 0.0 0.10 Zwarma - The Shawarma Maker Palani
2021-08-05 0.0 0.10 Zwarma - The Shawarma Maker Palani
2021-08-06 0.0 0.10 Zwarma - The Shawarma Maker Palani
2021-08-01 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-02 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-03 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-04 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-05 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-06 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
tst.groupby(uni_ind).resample('D').bfill()..reset_index(level=(0,1,2,3),drop= True).reset_index()
解决方案
- 为失踪的商家创建月份日期范围
- 外连接到原始数据框和
fillna(method="bfill")
import pandas as pd
import io
df = pd.read_csv(io.StringIO("""Date % on Merchant % on Customer Merchants Location
2021-08-04 0.0 0.10 Zwarma - The Shawarma Maker Palani
2021-08-05 0.0 0.10 Zwarma - The Shawarma Maker Palani
2021-08-06 0.0 0.10 Zwarma - The Shawarma Maker Palani
2021-08-01 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-02 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-03 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-04 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-05 0.0 0.12 Zwarma - The Shawarma Maker Pollachi
2021-08-06 0.0 0.12 Zwarma - The Shawarma Maker Pollachi """), sep="\s\s+", engine="python")
df["Date"] = pd.to_datetime(df["Date"])
df = (
df.merge(
df.groupby(
[df["Date"].dt.year, df["Date"].dt.month, "Merchants", "Location"], as_index=False
)
.agg({"Date": "min"})
.loc[lambda d: d["Date"].dt.day.gt(1)]
.apply(
lambda r: pd.Series(
{
"Date": list(
pd.date_range(
r["Date"] - pd.offsets.MonthBegin(1),
r["Date"] - pd.Timedelta(days=1),
)
),
"Merchants": r["Merchants"],
"Location": r["Location"]
}
),
axis=1,
)
.explode("Date"),
on=["Date", "Merchants", "Location"],
how="outer",
)
.sort_values(["Merchants", "Location", "Date"])
.fillna(method="bfill")
)
df
日期 | 商户百分比 | 客户百分比 | 商家 | 地点 | |
---|---|---|---|---|---|
9 | 2021-08-01 00:00:00 | 0 | 0.1 | Zwarma - 沙瓦玛制造商 | 帕拉尼 |
10 | 2021-08-02 00:00:00 | 0 | 0.1 | Zwarma - 沙瓦玛制造商 | 帕拉尼 |
11 | 2021-08-03 00:00:00 | 0 | 0.1 | Zwarma - 沙瓦玛制造商 | 帕拉尼 |
0 | 2021-08-04 00:00:00 | 0 | 0.1 | Zwarma - 沙瓦玛制造商 | 帕拉尼 |
1 | 2021-08-05 00:00:00 | 0 | 0.1 | Zwarma - 沙瓦玛制造商 | 帕拉尼 |
2 | 2021-08-06 00:00:00 | 0 | 0.1 | Zwarma - 沙瓦玛制造商 | 帕拉尼 |
3 | 2021-08-01 00:00:00 | 0 | 0.12 | Zwarma - 沙瓦玛制造商 | 波拉奇 |
4 | 2021-08-02 00:00:00 | 0 | 0.12 | Zwarma - 沙瓦玛制造商 | 波拉奇 |
5 | 2021-08-03 00:00:00 | 0 | 0.12 | Zwarma - 沙瓦玛制造商 | 波拉奇 |
6 | 2021-08-04 00:00:00 | 0 | 0.12 | Zwarma - 沙瓦玛制造商 | 波拉奇 |
7 | 2021-08-05 00:00:00 | 0 | 0.12 | Zwarma - 沙瓦玛制造商 | 波拉奇 |
8 | 2021-08-06 00:00:00 | 0 | 0.12 | Zwarma - 沙瓦玛制造商 | 波拉奇 |