python - Python/Pandas:独特的表格转置和变换
问题描述
我有一个如下图所示的 Excel 表格,其中数字“1”标记了给定商店所在的存储桶。
+-------+-------+-----------+----------+----------+------------+----------+
| | | | H | H | N | G |
+-------+-------+-----------+----------+----------+------------+----------+
| | | | 111 | 114 | 120 | 130 |
+-------+-------+-----------+----------+----------+------------+----------+
| Group | Code | Name | Bucket 1 | Bucket A | Bucket Min | Bucket B |
+-------+-------+-----------+----------+----------+------------+----------+
| H | 11110 | Store 1 | 1 | | | 1 |
+-------+-------+-----------+----------+----------+------------+----------+
| N | 11450 | Store AAA | | | 1 | |
+-------+-------+-----------+----------+----------+------------+----------+
| H | 11152 | Store BBB | | | 1 | 1 |
+-------+-------+-----------+----------+----------+------------+----------+
| H | 11734 | Store XXX | | 1 | | |
+-------+-------+-----------+----------+----------+------------+----------+
我想把这张桌子改成我下面的样子。
+----------+--------+------------+---------------------------------------------------+
| Category | Number | Transfer | Received Information |
+----------+--------+------------+---------------------------------------------------+
| H | 111 | Bucket 1 | Received for Store 1 (11110) |
+----------+--------+------------+---------------------------------------------------+
| H | 114 | Bucket A | Received for Store XXX (11734) |
+----------+--------+------------+---------------------------------------------------+
| N | 120 | Bucket Min | Received for Store AAA (11450), Store BBB (11152) |
+----------+--------+------------+---------------------------------------------------+
| G | 130 | Bucket B | Received from Store 1 (11111), Store BBB (11152) |
+----------+--------+------------+---------------------------------------------------+
这种独特的转置和变换如何在 Python/Pandas 中完成?
请注意,文本“Received for”将始终出现在“Received information”列中每个单元格的开头。
解决方案
我会试一试。首先,您应该使要从中读取数据的 excel 文件对 pandas 更友好,方法是为列名添加额外的行并正确命名行:
之后,此代码应为您提供所需的结果:
import pandas as pd
f = r"path/to/file.xlsx"
# %%
df = (
pd.read_excel(f, index_col=[0, 1, 2], header=[0, 1, 2])
# move away from multi-index
.reset_index()
# create a new column combining Code and Name
.assign(NameCode=lambda x: x["Name"] + " (" + x["Code"].astype(str) + ")")
# drop the columns we combined into the new one above
.drop(columns=["Name", "Code"])
# 'transpose' table to create one entry for each of the NameCode
# entries
.melt(id_vars=["Group", "NameCode"], value_name="deliveries")
# drop everything with NaN
.loc[lambda x: ~x["deliveries"].isnull()]
)
print(df)
# %%
pvt = df.groupby(by=["Category", "Number", "Transfer"]).agg(
Received_Information=pd.NamedAgg(
column="NameCode", aggfunc=lambda x: "Received from " + ", ".join(x.str.strip())
)
)
print(pvt)
Group NameCode Category Number Transfer deliveries
0 H Store 1 (11110) H 111 Bucket 1 1.0
7 N Store XXX (11734) H 114 Bucket A 1.0
9 N Store AAA (11450) N 120 Bucket Min 1.0
10 H Store BBB (11152) N 120 Bucket Min 1.0
12 H Store 1 (11110) G 130 Bucket B 1.0
14 H Store BBB (11152) G 130 Bucket B 1.0
Received_Information
Category Number Transfer
G 130 Bucket B Received from Store 1 (11110), Store BBB (11152)
H 111 Bucket 1 Received from Store 1 (11110)
114 Bucket A Received from Store XXX (11734)
N 120 Bucket Min Received from Store AAA (11450), Store BBB (11152)
推荐阅读
- ruby-on-rails - 我可以更有效地编写 IO 绑定的 RSpec 测试吗?
- amazon-web-services - 如何通过无服务器框架将 BatchWriteItem 权限附加到我的 IAM 角色策略?
- oop - 当特征需要的状态多于结构中包含的状态时,如何为结构实现特征?
- java - Java ZipCoder 异常
- javascript - sapui5 列表中的数据绑定没有给出数据
- matlab - 根据matlab中的一些值组合数据
- c# - 如何检测 RepositionThemeTransition 是否完成或强制 RepositionThemeTransition 激活
- applescript - 使用 Applescript 删除文件夹在某些系统上不起作用
- javascript - 使用电子的本机拖放文件上传器
- r - 如何向依赖于另一列值的数据框添加新列?