首页 > 解决方案 > 基于多列的分箱(分类值)的最佳方法

问题描述

我需要将两列中的值合并到另一列中。

假设以下是我的熊猫 df:

data = {'material':['Matl_A', 'Matl_B', 'Matl_B', 'Matl_A'], 
        'strength':[10, 20, 30, 100]  
df = pd.DataFrame(data)

所以我的df是:

  material   strength  
 ---------- ---------- 
  Matl_A           10  
  Matl_B           20  
  Matl_B           30  
  Matl_A          100  

我想做这样的事情:

  material   strength    grade
 ---------- ---------- ---------
  Matl_A           10       1
  Matl_B           20       4
  Matl_B           80       5
  Matl_A          100       2

最好的方法是什么?

编辑:

我在下面使用了迈克尔加德纳的答案并对其进行了扩展,因为我们有很多材料。希望这次修订提供了更清晰的画面。如果我有 20 种具有不同条件范围的材料需要分箱,那将是一种更优雅的方法来解决这个问题:

    import numpy as np
    import pandas as pd

    strength = np.random.randint(low=1, high=30, size=20)
    material = ['matl_a', 'matl_b', 'matl_b', 'matl_a', 'matl_d',
                'matl_b', 'matl_d', 'matl_a', 'matl_a', 'matl_b',
                'matl_a', 'matl_b', 'matl_e', 'matl_a', 'matl_c',
                'matl_b', 'matl_c', 'matl_a', 'matl_a', 'matl_b']

    data = {'material':material, 
            'strength':strength } 
    df = pd.DataFrame(data)

    def grading(df):
        if df['material'] == 'matl_a':
            if 0 <= df['strength'] <=10:
                return 1
            elif 11 <= df['strength'] <= 20:
                return 2
            elif 21 <= df['strength'] <= 30:
                return 3
            elif 31 <= df['strength'] <= 40:
                return 4
            else:
                return 5
        elif df['material'] == 'matl_b':
            if 0 <= df['strength'] <=10:
                return 6
            elif 11 <= df['strength'] <= 20:
                return 7
            elif 21 <= df['strength'] <= 30:
                return 8
            elif 31 <= df['strength'] <= 40:
                return 9
            else:
                return 10
        elif df['material'] == 'matl_c':
            if 0 <= df['strength'] <=10:
                return 11
            elif 11 <= df['strength'] <= 20:
                return 12
            elif 21 <= df['strength'] <= 30:
                return 13
            elif 31 <= df['strength'] <= 40:
                return 14
            else:
                return 15        
        else:
            if 0 <= df['strength'] <=10:
                return 16
            elif 11 <= df['strength'] <= 20:
                return 17
            elif 21 <= df['strength'] <= 30:
                return 18
            elif 31 <= df['strength'] <= 40:
                return 19
            else:
                return 20

    df['grade'] = df.apply(grading, axis=1)

标签: pythonpython-3.xpandasdataframe

解决方案


采用np.select

a = df.material.eq('Matl_A')
b = df.material.eq('Matl_B')

df['grade'] = np.select([a & df.strength.between(5,10),
                         a & df.strength.between(11,20),
                         b & df.strength.between(10,50),
                         b & df.strength.between(50,100)],
                         ['A', 'B', 'A', 'B'],
                         default='C')

推荐阅读