首页 > 解决方案 > Python中的分箱/分组数字

问题描述

我对 Python 很陌生,所以对我希望是一个简单的问题表示歉意。我已将数据加载到熊猫数据框中。其中一列是美元金额(PAYMENT_AMOUNT)。我会创建一个新的列和 bin/group 类似的范围。如果我要在 SQL 中执行此操作,我将使用以下 case 语句。Python中有类似的功能吗?

(
          CASE
            WHEN PAYMENT_AMOUNT BETWEEN '0' AND '9.99'
            THEN '0-9.99'
           WHEN PAYMENT_AMOUNT BETWEEN '10' AND '19.99'
            THEN '10-19.99'
            WHEN PAYMENT_AMOUNT BETWEEN '20' AND '39.99'
            THEN '20-39.99'
            WHEN PAYMENT_AMOUNT BETWEEN '40' AND '59.99'
            THEN '40-59.99'
            WHEN PAYMENT_AMOUNT BETWEEN '60' AND '79.99'
            THEN '60-79.99'
            WHEN PAYMENT_AMOUNT BETWEEN '80' AND '99.99'
            THEN '80-99.99'
            WHEN PAYMENT_AMOUNT BETWEEN '100' AND '149.99'
            THEN '100-149.99'
            WHEN PAYMENT_AMOUNT BETWEEN '150' AND '299.99'
            THEN '150-299.99'
            WHEN PAYMENT_AMOUNT BETWEEN '300' AND '499.99'
            THEN '300-499.99'
            WHEN PAYMENT_AMOUNT >= '500'
            THEN '500+'
            ELSE 'other'
          END) USD_BIN

标签: python

解决方案


def binning(bin):
    if bin < 10:
        return '0-9.99'
    if 10 <= bin < 20:
        return '10-19.99'
    if 20 <= bin < 40:
        return '20-39.99'
    if 40 <= bin < 60:
        return '40-59.99'
    if 60 <= bin < 80:
        return '60-79.99'
    if 80 <= bin < 100:
        return '80-99.99'
    if 100 <= bin < 150:
        return '100-149.99'
    if 150 <= bin < 300:
        return '150-299.99'
    if 300 <= bin < 500:
        return '300-499.99'
    if bin >= 500:
        return '500+'
    else:
        return 'other'

vfunc = np.vectorize(分箱)

df['column_name'] = vfunc(df['payment_amount'])


推荐阅读