python - 使用日期作为列值重塑数据
问题描述
我正在尝试使用 pandas 重塑数据,并且很难将其转换为正确的格式。大致上,数据如下所示*:
df = pd.DataFrame({'PRODUCT':['1','2'],
'DESIGN_START':[pd.Timestamp('2020-01-05'),pd.Timestamp('2020-01-17')],
'DESIGN_COMPLETE':[pd.Timestamp('2020-01-22'),pd.Timestamp('2020-03-04')],
'PRODUCTION_START':[pd.Timestamp('2020-02-07'),pd.Timestamp('2020-03-15')],
'PRODUCTION_COMPLETE':[np.nan,pd.Timestamp('2020-04-28')]})
print(df)
PRODUCT DESIGN_START DESIGN_COMPLETE PRODUCTION_START PRODUCTION_COMPLETE
0 1 2020-01-05 2020-01-22 2020-02-07 NaT
1 2 2020-01-17 2020-03-04 2020-03-15 2020-04-28
我想重塑数据,使其看起来像这样:
reshaped_df = pd.DataFrame({'DATE':[pd.Timestamp('2020-01-05'),pd.Timestamp('2020-01-17'),
pd.Timestamp('2020-01-22'),pd.Timestamp('2020-03-04'),
pd.Timestamp('2020-02-07'),pd.Timestamp('2020-03-15'),
np.nan,pd.Timestamp('2020-04-28')],
'STAGE':['design','design','design','design','production','production','production','production'],
'STATUS':['started','started','completed','completed','started','started','completed','completed']})
print(reshaped_df)
DATE STAGE STATUS
0 2020-01-05 design started
1 2020-01-17 design started
2 2020-01-22 design completed
3 2020-03-04 design completed
4 2020-02-07 production started
5 2020-03-15 production started
6 NaT production completed
7 2020-04-28 production completed
我该怎么做呢?有没有更好的格式来重塑它?
最终我想对数据做一些分组总结,比如每个步骤发生的次数,例如
reshaped_df.groupby(['STAGE','STATUS'])['DATE'].count()
STAGE STATUS
design completed 2
started 2
production completed 1
started 2
Name: DATE, dtype: int64
谢谢
- 数据实际上包含许多用于制造管道不同阶段的日期开始/停止列
解决方案
融化它!
import pandas as pd
import numpy as np
df = pd.DataFrame({
'PRODUCT':['1','2'],
'DESIGN_START':[pd.Timestamp('2020-01-05'),pd.Timestamp('2020-01-17')],
'DESIGN_COMPLETE':[pd.Timestamp('2020-01-22'),pd.Timestamp('2020-03-04')],
'PRODUCTION_START':[pd.Timestamp('2020-02-07'),pd.Timestamp('2020-03-15')],
'PRODUCTION_COMPLETE':[np.nan,pd.Timestamp('2020-04-28')]
})
df = df.melt(id_vars=['PRODUCT'])
df_split = df['variable'].str.split('_', n=1, expand=True)
df['STAGE'] = df_split[0]
df['STATUS'] = df_split[1]
df.drop(columns=['variable'], inplace=True)
df = df.rename(columns={'value': 'DATE'})
print(df)
输出:
PRODUCT DATE STAGE STATUS
0 1 2020-01-05 DESIGN START
1 2 2020-01-17 DESIGN START
2 1 2020-01-22 DESIGN COMPLETE
3 2 2020-03-04 DESIGN COMPLETE
4 1 2020-02-07 PRODUCTION START
5 2 2020-03-15 PRODUCTION START
6 1 NaT PRODUCTION COMPLETE
7 2 2020-04-28 PRODUCTION COMPLETE
哇哈哈哈哈哈!!!感受融化的力量!!!
熔体基本上是不可旋转的
推荐阅读
- java - 将纹理图集与单个图像发送到 GPU 之间是否存在速度差异?
- amazon-web-services - AWS:指定多个子网时,RDS 会附加多个 IP 地址和 ENI 吗?
- java - 我已经编写了一个用于从数组中查找最小数字的代码,但它没有给出正确的输出
- kotlin - 聚合时是否将 kotlin 类型检查视为代码异味
- python - 子元素中的 Python 搜索
- python - 根据范围动态地将基本整数值增加到最大值?
- laravel - 如何查看 pgsql 版本
- r - Spatstat:有没有办法测试各向异性,特别是当有聚类时
- macos - 为什么我的 println!() 输出后面有一个“D”?
- java - CMake/CMakeLists.txt:如何显式列出文件(避免使用 GLOB)