首页 > 解决方案 > Datafram 添加具有条件的新列并基于另一列

问题描述

我有客户 rfm 的数据帧。

客户数据框

样本数据在这里:

df_cust = pd.Dataframe(
 'CustNo' = [001, 002, 003, 004],
 'Recency' = [5,10,200,150],
 'Frequency' = [1, 3, 10, 1]
)

我想创建新列是'score_recency'。我也有 2 个函数计算分数(通常我使用 lambda 函数来创建熊猫列。)

def cal_new_cust(recency):
    return score # logic code with new cust frequency = 1
def cal_old_cust(recency):
   return score # logic code with old cust frequency > 1

我如何通过应用 2 func 计算列频率上每个客户群的分数来创建列'score_recency'?

标签: pythonpandas

解决方案


您可以尝试使用 DataFrame.apply()。示例如下:

df_cust = pd.DataFrame(
    {
 'CustNo' : ['001', '002', '003', '004'],
 'Recency' : [5,10,200,150],
 'Frequency' : [1, 3, 10, 1]}
)

存根函数:

def cal_new_cust(recency):
    return recency+1 # logic code with new cust frequency = 1

def cal_old_cust(recency):
    return recency-1 # logic code with old cust frequency > 1

新代码:根据频率,该函数将调用新老客户函数。

def decision_maker(pair):
    if pair['Frequency'] == 1:
        return cal_new_cust(pair['Recency'])
    else:
        return cal_old_cust(pair['Recency'])
df_cust['score_recency']=df_cust.apply(decision_maker,axis=1)# send as a row pair
df_cust

是的,可以通过apply函数传递值,请参考应用参数

所以它看起来像

def cal_new_cust(recency,mypar=None):
    print(mypar)
    return recency+1 # logic code with new cust frequency = 1

def cal_old_cust(recency,mypar=None):
    print(mypar)
    return recency-1 # logic code with old cust frequency > 1


def decision_maker(pair,**kwargs):
    print(f" June:{kwargs['june']} july:{kwargs['july']}")
    if pair['Frequency'] == 1:
        return cal_new_cust(pair['Recency'],kwargs['june'])
    else:
        return cal_old_cust(pair['Recency'],kwargs['july'])
df_cust['score_recency']=df_cust.apply(decision_maker,june=30, july=20,axis=1)
df_cust

推荐阅读