首页 > 解决方案 > Pandas 中的 pivot_wider 是什么?

问题描述

这就是我在 R 中所做的,基于一列从长变为宽。这也是该块底部的预期输出。

text="                    id value       Site   Parameter
'2021-03-15 11:09:15'     a  Sitename1  Parameter1
'2021-03-16 11:09:15'     b  Sitename1  Parameter1
'2021-03-17 11:09:15'     c  Sitename1  Parameter1
'2021-03-15 11:09:15'     d  Sitename1  Parameter2
'2021-03-16 11:09:15'     e  Sitename1  Parameter2
'2021-03-17 11:09:15'     f  Sitename1  Parameter2
'2021-03-15 11:09:15'   2.5  Sitename2  Parameter1
'2021-03-16 11:09:15'   1.2  Sitename2  Parameter1
'2021-03-17 11:09:15'   0.7  Sitename2  Parameter1
'2021-03-15 11:09:15'   3.2  Sitename2  Parameter2
'2021-03-16 11:09:15'   1.3  Sitename2  Parameter2
'2021-03-17 11:09:15'   0.1  Sitename2  Parameter2"
dat=read.table(text=text, header=TRUE)
pivot_wider(dat, names_from=Parameter, values_from=value)

  id                  Site      Parameter1 Parameter2
  <chr>               <chr>     <chr>      <chr>     
1 2021-03-15 11:09:15 Sitename1 a          d         
2 2021-03-16 11:09:15 Sitename1 b          e         
3 2021-03-17 11:09:15 Sitename1 c          f         
4 2021-03-15 11:09:15 Sitename2 2.5        3.2       
5 2021-03-16 11:09:15 Sitename2 1.2        1.3       
6 2021-03-17 11:09:15 Sitename2 0.7        0.1  

这就是我在 Python 中所拥有的。你怎么在pivot_wider这里?这会返回数据框中的漏洞,如果我包含index=["id", "value"]index="id"它会给我错误。

import pandas as pd
import datetime

np.random.seed(123)
df = pd.DataFrame({"Sitename1_Parameter1" : {0 : "a", 1 : "b", 2 : "c"},
                   "Sitename1_Parameter2" : {0 : "d", 1 : "e", 2 : "f"},
                   "Sitename2_Parameter1" : {0 : 2.5, 1 : 1.2, 2 : .7},
                   "Sitename2_Parameter2" : {0 : 3.2, 1 : 1.3, 2 : .1}
                  })
df["id"] = [datetime.datetime(2021, 3, 15, 11, 9, 15), datetime.datetime(2021, 3, 16, 11, 9, 15),\
            datetime.datetime(2021, 3, 17, 11, 9, 15)]

df=df.melt(id_vars="id", value_vars=["Sitename1_Parameter1", "Sitename1_Parameter2",
                                        "Sitename2_Parameter1", "Sitename2_Parameter2"])
df[["Site", "Parameter"]]=df["variable"].str.split("_", 1, expand=True)

df=df.drop("variable", axis=1)

print df
                    id value       Site   Parameter
0  2021-03-15 11:09:15     a  Sitename1  Parameter1
1  2021-03-16 11:09:15     b  Sitename1  Parameter1
2  2021-03-17 11:09:15     c  Sitename1  Parameter1
3  2021-03-15 11:09:15     d  Sitename1  Parameter2
4  2021-03-16 11:09:15     e  Sitename1  Parameter2
5  2021-03-17 11:09:15     f  Sitename1  Parameter2
6  2021-03-15 11:09:15   2.5  Sitename2  Parameter1
7  2021-03-16 11:09:15   1.2  Sitename2  Parameter1
8  2021-03-17 11:09:15   0.7  Sitename2  Parameter1
9  2021-03-15 11:09:15   3.2  Sitename2  Parameter2
10 2021-03-16 11:09:15   1.3  Sitename2  Parameter2
11 2021-03-17 11:09:15   0.1  Sitename2  Parameter2

print df.pivot(columns="Parameter", values="value")
Parameter Parameter1 Parameter2
0                  a        NaN
1                  b        NaN
2                  c        NaN
3                NaN          d
4                NaN          e
5                NaN          f
6                2.5        NaN
7                1.2        NaN
8                0.7        NaN
9                NaN        3.2
10               NaN        1.3
11               NaN        0.1

标签: pythonpandasdataframe

解决方案


问题是你的熊猫版本太旧了,有必要升级可能的传递列表到index参数pivot

df1 = (df.pivot(index=['id','Site'],columns="Parameter", values="value")
         .sort_index(level=[1,0]))
Parameter                     Parameter1 Parameter2
id                  Site                           
2021-03-15 11:09:15 Sitename1          a          d
2021-03-16 11:09:15 Sitename1          b          e
2021-03-17 11:09:15 Sitename1          c          f
2021-03-15 11:09:15 Sitename2        2.5        3.2
2021-03-16 11:09:15 Sitename2        1.2        1.3
2021-03-17 11:09:15 Sitename2        0.7        0.1

如果无法升级:

df1 = (df.set_index(['id','Site',"Parameter"])["value"]
         .unstack()
         .sort_index(level=[1,0]))

推荐阅读