首页 > 解决方案 > 创建一个函数,该函数根据数据框中其他列的值创建一个新列并确定无效值

问题描述

以下是相关 DF 的简化版本:

df = pd.DataFrame({'type': ['terrier', 'toy','toy','toy', 'hound' , 'terrier', 
                            'terrier', 'terrier','terrier', 'hound'],
                            'breed' : ['yorkshire_terrier', 'king_charles_spaniel', 'poodle', 'shih_tzu',
                            'greyhound', 'west_highland', 'bull_terrier' , 'fox_terrier', 
                            'west_highland', 'afghan'],
                   'colour' : ['pink', 'orange','brown','purple', 'grey' , 'white', 
                               'black', 'cream','brown', 'brown']})
    
df


    type         breed                  colour
0   terrier     yorkshire_terrier       pink
1   toy         king_charles_spaniel    orange
2   toy         poodle                  brown
3   toy         shih_tzu                purple
4   hound       greyhound               grey
5   terrier     west_highland           white
6   terrier     bull_terrier            black
7   terrier     fox_terrier             cream
8   terrier     west_highland           brown
9   hound       afghan                  brown

使用下面的函数,我可以使用new_colours这些字典中提供的规则创建一个新列

字典:

toy = {'black' : ['poodle', 'shih_tzu'], 
       'mixed' : 'king_charles_spaniel',
       'white' : ['poodle', 'shih_tzu']}

terrier = {'black_brown' : ['yorkshire_terrier','bull_terrier'],
           'white' : 'west_highland',
           'white_orange' : 'fox_terrier'}

hound = {'brindle' : 'greyhound',
           'brown' : 'afghan'}

功能:

def colours(x):
    for dog in [hound,toy,terrier]:
        for colour in dog:
            if x in dog[colour]:
                return colour

df['new_colour']=df['breed'].map(colours)

输出:

    type    breed                 colour    new_colour
0   terrier yorkshire_terrier     pink      black_brown
1   toy     king_charles_spaniel  orange    mixed
2   toy     poodle                black     white
3   toy     shih_tzu              purple    black
4   hound   greyhound             grey      brindle
5   terrier west_highland         white     white
6   terrier bull_terrier          black     black_brown
7   terrier fox_terrier           cream     white_orange
8   terrier west_highland         brown     white
9   hound   afghan                brown     brown

然而,这里的问题在于贵宾犬(在真正的 DF 中有更多的案例)。根据字典中的规则,贵宾犬可以是whiteblack。它最初在colourcol 中被标记为存在black- 但new_colourwhite这是可能的,但我希望将原始colour列作为正确的颜色。

标签: python-3.xpandas

解决方案


您可以修改您的colours功能:

def colours(x):
    possibilities=[]
    for dog in [hound,toy,terrier]:
        for colour in dog:
            
            if x in dog[colour]:
                possibilities.append(colour)
            
    if df[df.breed==x].colour.values[0] in possibilities:
        return df[df.breed==x].colour.values[0]
    else:
        return possibilities[0]

这假设您正在处理的数据集名为df,否则您可以将其作为参数传递给colours

def colours(x,df):
    possibilities=[]
    for dog in [hound,toy,terrier]:
        for colour in dog:
            
            if x in dog[colour]:
                possibilities.append(colour)
            
    if df[df.breed==x].colour.values[0] in possibilities:
        return df[df.breed==x].colour.values[0]
    else:
        return possibilities[0]

df['new_colour']=df['breed'].map(lambda x: colours(x,df))

          

推荐阅读