首页 > 解决方案 > 如何将python数据框中前几行的值转换为新列

问题描述

我正在尝试从 python 排序数据框中每组的前 3 条记录中选择值并将它们放入新列中。我有一个处理每个组的函数,但我很难找到正确的方法来提取、重命名系列,然后将结果组合为一个系列以返回。

下面是输入数据帧 (df_in) 和预期输出 (df_out) 的简化示例:

import pandas as pd
data_in = { 'Product': ['A', 'A', 'A', 'A', 'B', 'C', 'C'],
        'Price':  [25.0, 30.5, 50.0, 61.5, 120.0, 650.0, 680.0], 
        'Qty': [15 , 13, 14, 10, 5, 2, 1]}
df_in = pd.DataFrame (data_in, columns = ['Product', 'Price', 'Qty'])

数据框输入和转换后的输出在新列中具有最佳 3 条记录值

我正在复制以下 2 个我测试过的函数示例,并试图获得一个更有效的选项,特别是如果我必须处理更多的列和记录。函数 best3_prices_v1 有效,但必须明确指定每个列或变量,尤其是一个问题,因为我必须添加更多列。

def best3_prices_v1(x):     
   d = {}

# get best 3 records if records available, else set volumes as zeroes   
best_price_lv1 = x.iloc[0].copy()

rec_with_zeroes = best_price_lv1.copy()
rec_with_zeroes['Price'] = 0
rec_with_zeroes['Qty'] = 0

recs = len(x) # number of records
if (recs == 1): 
    # 2nd and 3rd records not available
    best_price_lv2 = rec_with_zeroes.copy()
    best_price_lv3 = rec_with_zeroes.copy()
elif (recs == 2):        
    best_price_lv2 = x.iloc[1]
    # 3rd record not available
    best_price_lv3 = rec_with_zeroes.copy()
else:
    best_price_lv2 = x.iloc[1]
    best_price_lv3 = x.iloc[2]    

# 1st best
d['Price_1'] = best_price_lv1['Price'] 
d['Qty_1'] = best_price_lv1['Qty'] 

# 2nd best
d['Price_2'] = best_price_lv2['Price'] 
d['Qty_2'] = best_price_lv2['Qty'] 

# 3rd best
d['Price_3'] = best_price_lv3['Price'] 
d['Qty_3'] = best_price_lv3['Qty'] 

# return combined results as a series
return pd.Series(d, index=['Price_1', 'Qty_1', 'Price_2', 'Qty_2', 'Price_3', 'Qty_3'])

调用函数的代码:

# sort dataframe by Product and Price
df_in.sort_values(by=['Product', 'Price'], ascending=True, inplace=True)
# get best 3 prices and qty as new columns
df_out = df_in.groupby(['Product']).apply(best3_prices_v1).reset_index()

第二次尝试改进/减少每个变量的代码和显式名称......不完整且不起作用。

def best3_prices_v2(x):     
d = {}

# get best 3 records if records available, else set volumes as zeroes   
best_price_lv1 = x.iloc[0].copy()

rec_with_zeroes = best_price_lv1.copy()
rec_with_zeroes['Price'] = 0
rec_with_zeroes['Qty'] = 0

recs = len(x) # number of records
if (recs == 1): 
    # 2nd and 3rd records not available
    best_price_lv2 = rec_with_zeroes.copy()
    best_price_lv3 = rec_with_zeroes.copy()
elif (recs == 2):        
    best_price_lv2 = x.iloc[1]
    # 3rd record not available
    best_price_lv3 = rec_with_zeroes.copy()
else:
    best_price_lv2 = x.iloc[1]
    best_price_lv3 = x.iloc[2]   


stats_columns = ['Price', 'Qty']

 # get records values for best 3 prices
d_lv1 = best_price_lv1[stats_columns]
d_lv2 = best_price_lv2[stats_columns] 
d_lv3 = best_price_lv3[stats_columns] 

# How to rename (keys?) or combine values to return?
lv1_stats_columns = [c + '_1' for c in stats_columns]
lv2_stats_columns = [c + '_2' for c in stats_columns]
lv3_stats_columns = [c + '_3' for c in stats_columns]
    
# return combined results as a series
return pd.Series(d, index=lv1_stats_columns + lv2_stats_columns + lv3_stats_columns)

标签: pythonpandasdataframepandas-groupbyseries

解决方案


让我们unstack()

df_in=(df_in.set_index([df_in.groupby('Product').cumcount().add(1),'Product'])
             .unstack(0,fill_value=0))
df_in.columns=[f"{x}_{y}" for x,y in df_in]
df_in=df_in.reset_index()

或通过pivot()

df_in=(df_in.assign(key=df_in.groupby('Product').cumcount().add(1))
      .pivot('Product','key',['Price','Qty'])
      .fillna(0,downcast='infer'))
df_in.columns=[f"{x}_{y}" for x,y in df_in]
df_in=df_in.reset_index()

推荐阅读