python-3.x - 根据 Pandas 中的前缀将多列融合到对应的列
问题描述
假设以下数据框:
id transaction seller0 seller1 seller2 buyer0 buyer1
0 1 Subject1 Tim Jamie Melissa Rosie NaN
1 2 Subject2 Rima Derren NaN Annalise Hania
2 3 Subject3 Rosa NaN NaN Joshua NaN
我怎样才能将它重塑为以下格式?即每个交易的seller0 seller1 seller2
toseller
和buyer0 buyer1
tobuyer
列。
需要的输出:
id transaction seller buyer
0 1 Subject1 Tim Rosie
1 1 Subject1 Jamie NaN
2 1 Subject1 Melissa NaN
3 2 Subject2 Rima Annalise
4 2 Subject2 Derren Hania
5 3 Subject3 Rosa Joshua
代码:
df.melt(['id', 'transaction'], value_name = 'seller').drop('variable', 1)
出去:
id transaction seller
0 1 Subject1 Tim
1 2 Subject2 Rima
2 3 Subject3 Rosa
3 1 Subject1 Jamie
4 2 Subject2 Derren
5 3 Subject3 NaN
6 1 Subject1 Melissa
7 2 Subject2 NaN
8 3 Subject3 NaN
9 1 Subject1 Rosie
10 2 Subject2 Annalise
11 3 Subject3 Joshua
12 1 Subject1 NaN
13 2 Subject2 Hania
14 3 Subject3 NaN
所需的更新输出:
id transaction type name
0 1 Subject1 seller Tim
1 1 Subject1 seller Jamie
2 1 Subject1 seller Melissa
3 2 Subject2 seller Rima
4 2 Subject2 seller Derren
5 3 Subject3 seller Rosa
6 1 Subject1 buyer Rosie
7 2 Subject2 buyer Annalise
8 2 Subject2 buyer Hania
9 3 Subject3 buyer Joshua
解决方案
(
pd.wide_to_long(df,
stubnames=["seller", "buyer"],
i=["id", "transaction"],
j="num")
.dropna(how="all")
.droplevel(level=-1)
.reset_index()
)
id transaction seller buyer
0 1 Subject1 Tim Rosie
1 1 Subject1 Jamie NaN
2 1 Subject1 Melissa NaN
3 2 Subject2 Rima Annalise
4 2 Subject2 Derren Hania
5 3 Subject3 Rosa Joshua
您还可以使用pyjanitor的pivot_longer函数;目前你必须从github安装最新的开发版本:
# install latest dev version
# pip install git+https://github.com/ericmjl/pyjanitor.git
import janitor
(
df.pivot_longer(index=["id", "transaction"],
names_to=".value",
names_pattern=r"([a-z]+)\d")
.dropna(subset=["seller", "buyer"], how="all")
)
id transaction seller buyer
0 1 Subject1 Tim Rosie
1 2 Subject2 Rima Annalise
2 3 Subject3 Rosa Joshua
3 1 Subject1 Jamie NaN
4 2 Subject2 Derren Hania
6 1 Subject1 Melissa NaN
更新:
对于更新后的结果,您可以堆叠并进行一些小的调整:
(
df.set_index(["id", "transaction"])
.stack()
.rename_axis(["id", "transaction", "type"])
.reset_index(name="name")
.assign(type=lambda df: df["type"].str[:-1])
)
id transaction type name
0 1 Subject1 seller Tim
1 1 Subject1 seller Jamie
2 1 Subject1 seller Melissa
3 1 Subject1 buyer Rosie
4 2 Subject2 seller Rima
5 2 Subject2 seller Derren
6 2 Subject2 buyer Annalise
7 2 Subject2 buyer Hania
8 3 Subject3 seller Rosa
9 3 Subject3 buyer Joshua
你也可以使用pivot_longer:
result = df.pivot_longer(index=["id", "transaction"],
names_to="type",
names_pattern=r"([a-z]+)\d",
values_to="name").dropna()
result
id transaction type name
0 1 Subject1 seller Tim
1 2 Subject2 seller Rima
2 3 Subject3 seller Rosa
3 1 Subject1 seller Jamie
4 2 Subject2 seller Derren
6 1 Subject1 seller Melissa
9 1 Subject1 buyer Rosie
10 2 Subject2 buyer Annalise
11 3 Subject3 buyer Joshua
13 2 Subject2 buyer Hania
在这两种情况下,您都试图完全摆脱空条目。默认情况下,堆栈会删除空条目。