首页 > 解决方案 > 根据 Pandas 中的前缀将多列融合到对应的列

问题描述

假设以下数据框:

   id transaction seller0 seller1  seller2    buyer0 buyer1
0   1    Subject1     Tim   Jamie  Melissa     Rosie    NaN
1   2    Subject2    Rima  Derren      NaN  Annalise  Hania
2   3    Subject3    Rosa     NaN      NaN    Joshua    NaN

我怎样才能将它重塑为以下格式?即每个交易的seller0 seller1 seller2tosellerbuyer0 buyer1tobuyer列。

需要的输出:

   id transaction   seller     buyer
0   1    Subject1      Tim     Rosie
1   1    Subject1    Jamie       NaN
2   1    Subject1  Melissa       NaN
3   2    Subject2     Rima  Annalise
4   2    Subject2   Derren     Hania
5   3    Subject3     Rosa    Joshua

代码:

df.melt(['id', 'transaction'], value_name = 'seller').drop('variable', 1)

出去:

    id transaction    seller
0    1    Subject1       Tim
1    2    Subject2      Rima
2    3    Subject3      Rosa
3    1    Subject1     Jamie
4    2    Subject2    Derren
5    3    Subject3       NaN
6    1    Subject1   Melissa
7    2    Subject2       NaN
8    3    Subject3       NaN
9    1    Subject1     Rosie
10   2    Subject2  Annalise
11   3    Subject3    Joshua
12   1    Subject1       NaN
13   2    Subject2     Hania
14   3    Subject3       NaN

所需的更新输出:

   id transaction    type      name
0   1    Subject1  seller       Tim
1   1    Subject1  seller     Jamie
2   1    Subject1  seller   Melissa
3   2    Subject2  seller      Rima
4   2    Subject2  seller    Derren
5   3    Subject3  seller      Rosa
6   1    Subject1   buyer     Rosie
7   2    Subject2   buyer  Annalise
8   2    Subject2   buyer     Hania
9   3    Subject3   buyer    Joshua

标签: python-3.xpandasdataframe

解决方案


使用wide_to_long

(
    pd.wide_to_long(df, 
                    stubnames=["seller", "buyer"], 
                    i=["id", "transaction"], 
                    j="num")
    .dropna(how="all")
    .droplevel(level=-1)
    .reset_index()
)

id  transaction     seller  buyer
0   1   Subject1    Tim     Rosie
1   1   Subject1    Jamie   NaN
2   1   Subject1    Melissa     NaN
3   2   Subject2    Rima    Annalise
4   2   Subject2    Derren  Hania
5   3   Subject3    Rosa    Joshua

您还可以使用pyjanitor的pivot_longer函数;目前你必须从github安装最新的开发版本:

 # install latest dev version
# pip install git+https://github.com/ericmjl/pyjanitor.git
 import janitor

(
    df.pivot_longer(index=["id", "transaction"], 
                    names_to=".value", 
                    names_pattern=r"([a-z]+)\d")
   .dropna(subset=["seller", "buyer"], how="all")
)



    id  transaction     seller      buyer
0   1   Subject1        Tim         Rosie
1   2   Subject2        Rima        Annalise
2   3   Subject3        Rosa        Joshua
3   1   Subject1        Jamie       NaN
4   2   Subject2        Derren      Hania
6   1   Subject1        Melissa     NaN

更新:

对于更新后的结果,您可以堆叠并进行一些小的调整:

(
    df.set_index(["id", "transaction"])
    .stack()
    .rename_axis(["id", "transaction", "type"])
    .reset_index(name="name")
    .assign(type=lambda df: df["type"].str[:-1])
)

    id  transaction     type    name
0   1   Subject1    seller  Tim
1   1   Subject1    seller  Jamie
2   1   Subject1    seller  Melissa
3   1   Subject1    buyer   Rosie
4   2   Subject2    seller  Rima
5   2   Subject2    seller  Derren
6   2   Subject2    buyer   Annalise
7   2   Subject2    buyer   Hania
8   3   Subject3    seller  Rosa
9   3   Subject3    buyer   Joshua

你也可以使用pivot_longer

    result = df.pivot_longer(index=["id", "transaction"],
                             names_to="type",
                             names_pattern=r"([a-z]+)\d",
                             values_to="name").dropna()

    result


  id    transaction     type    name
0   1   Subject1    seller  Tim
1   2   Subject2    seller  Rima
2   3   Subject3    seller  Rosa
3   1   Subject1    seller  Jamie
4   2   Subject2    seller  Derren
6   1   Subject1    seller  Melissa
9   1   Subject1    buyer   Rosie
10  2   Subject2    buyer   Annalise
11  3   Subject3    buyer   Joshua
13  2   Subject2    buyer   Hania

在这两种情况下,您都试图完全摆脱空条目。默认情况下,堆栈会删除空条目。


推荐阅读