首页 > 解决方案 > 有没有办法在python中复制紧凑形式的excel数据透视表?

问题描述

我正在使用 Python 进行 excel 报告自动化。在 excel 报告中,我使用数据透视表,报告布局为“紧凑型”--> 其中一个或多个列显示为行标题。例如 - 我有以下输入数据

Country         City         Employee    Salary $ 
 Mexico         Chiapas      A        100,000 
 Mexico         Chihuahua    B        245,132 
 Mexico         Chihuahua    C        200,000 
 Mexico         Chihuahua    D        175,000 
 United States  Alabama      E        106,088 
 United States  Alaska       F        56,121 
 United States  Arizona      G        9,737 
 United States  Arizona      H        250,000  

在 excel 报告中,我将其显示为 -

Row Labels  Sum of Salary $
Mexico      720,132 
 Chiapas    100,000 
   A        100,000 
 Chihuahua  620,132 
   B        245,132 
   C        200,000 
   D        175,000 
United States   421,946 
 Alabama    106,088 
   E        106,088 
 Alaska     56,121 
   F        56,121 
 Arizona    259,737 
   G        9,737 
   H        250,000 

在紧凑视图中,我没有在单独的列中显示国家和城市,而是将它们显示为行标题,这是 excel 中的一项功能。我正在尝试在 Python 中复制相同的视图。我已经使用 pandas 数据框来输入原始文件。我使用了 df.pivot 和 df.pivot_table 但无法获得上述视图。

我正在尝试 df.pivot 和 df.pivot_table 函数,但只能获得如下通常的视图 -

Country          City      Employee Sum of Salary $
Mexico           Chiapas        A    100,000 
                 Chihuahua      B    245,132 
                                C    200,000 
                                D    175,000 
United States    Alabama        E    106,088 
                 Alaska         F    56,121 
                 Arizona        G    9,737 
                                H    250,000

标签: pythonpandaspandas-groupby

解决方案


应用多个更容易,groupby但是concat您想要一个排序的框架,因此我的回答是专门可以解决您的问题:

df

    Country         City       Employee  Salary
0   Mexico          Chiapas    A         100000
1   Mexico          Chihuahua  B         245132
2   Mexico          Chihuahua  C         200000
3   Mexico          Chihuahua  D         175000
4   United States   Alabama    E         106088
5   United States   Alaska     F         56121
6   United States   Arizona    G         9737
7   United States   Arizona    H         250000

编码:

res = pd.DataFrame()
country = df.groupby("Country").sum()
for i in range(len(country)):
    c = pd.DataFrame(country.iloc[i])
    c = c.reset_index(drop = True)
    c.index = c.columns
    c = c.reset_index()
    c.columns = ["Row Labels", "Salary"]

    city = df[df["Country"] == country.iloc[i].name].groupby("City").sum()


    for j in range(len(city)):
        c2 = pd.DataFrame(city.iloc[j])
        c2 = c2.reset_index(drop = True)
        c2.index = c2.columns
        c2 = c2.reset_index()
        c2.columns = ["Row Labels", "Salary"]
        employee = df[df["City"] == city.iloc[j].name].groupby("Employee").sum()
        c3 = employee.reset_index()
        c3.columns = ["Row Labels", "Salary"]

        res = pd.concat([res,c,c2,c3])

res = res.reset_index(drop = True)
res = res.drop_duplicates().reset_index(drop = True) 

结果:

res


    Row Labels       Salary
0   Mexico           720132
1   Chiapas          100000
2   A                100000
3   Chihuahua        620132
4   B                245132
5   C                200000
6   D                175000
7   United States    421946
8   Alabama          106088
9   E                106088
10  Alaska           56121
11  F                56121
12  Arizona          259737
13  G                9737
14  H                250000

如果您不介意标签的种类,则以下解决方案会更快(如果您有大型数据集):

c1 = df.groupby(["Country"])["Salary"].sum().reset_index()
c1.columns = ["Row Labels", "Salary"]

c2 = df.groupby(["Country","City"])["Salary"].sum().reset_index()[["City","Salary"]]
c2.columns = ["Row Labels", "Salary"]

c3 = df.groupby(["Country","City","Employee"])["Salary"].sum().reset_index()[["Employee","Salary"]]
c3.columns = ["Row Labels", "Salary"]

res = pd.concat([c1,c2,c3])

res


    Row Labels          Salary
0   Mexico              720132
1   United States       421946
0   Chiapas             100000
1   Chihuahua           620132
2   Alabama             106088
3   Alaska              56121
4   Arizona             259737
0   A                   100000
1   B                   245132
2   C                   200000
3   D                   175000
4   E                   106088
5   F                   56121
6   G                   9737
7   H                   250000

希望它有效!


推荐阅读