首页 > 解决方案 > 数据处理python

问题描述

我试图在列级别获取模型的模式,如果收到两个或更多模式,则找到与该模型相关的错误,然后我们选择这两种模式中错误最少的模型

import pandas as pd

data1 = {'Iteration1': ["M2",'M1',"M3","M5","M4","M6"],
        'Iteration1_error': [96,98,34,19,22,9],
        'Iteration2': ["M3",'M1',"M1","M5","M6","M4"],
        'Iteration2_error': [76,88,54,12,92,19],
        'Iteration3': ["M3",'M1',"M1","M5","M6","M4"],
        'Iteration3_error': [66,68,84,52,72,89]}

Input1 = pd.DataFrame(data1, 
                     columns=['Iteration1','Iteration1_error','Iteration2','Iteration2_error','Iteration3','Iteration3_error'], 
                     index=['I1', 'I2','I3','I4','I5','I6'])

print(Input1)

data2 = {'Iteration1': ["M2",'M1',"M3","M5","M4","M6"],
        'Iteration1_error': [96,98,34,19,22,9],
        'Iteration2': ["M3",'M1',"M1","M5","M6","M4"],
        'Iteration2_error': [76,88,54,12,92,19],
        'Iteration3': ["M3",'M1',"M1","M5","M6","M4"],
        'Iteration3_error': [66,68,84,52,72,89],
        'Mode of model name in all iterations':['M3','M1','M1','M5','M6','M4'],
        'Best model error':[66,68,54,12,72,19]
       }

Output1 = pd.DataFrame(data2, 
                     columns=['Iteration1','Iteration1_error','Iteration2','Iteration2_error','Iteration3','Iteration3_error','Mode of model name in all iterations','Best model error'], 
                     index=['I1', 'I2','I3','I4','I5','I6'])


print(Output1)

问题:所以我们期望最后有两个 etc 列的输出,一个告诉我们列级别的模式,第二个告诉我们该模式的错误,前 6 列是输入数据帧,以防收到两个或更多模式示例(“ M1","M2","M3") 所有三个值都不同,因此从技术上讲,它将具有 3 种模式,因此将选择精度最低的模型

我尝试了什么:我能够通过使用 .mode(numeric_only=False) 获得列级别的模式,但是我遇到了什么问题如何从第 2、第 4 和第 6 列获得模式错误,我被困在

标签: pythonpandasnumpymachine-learningpandas-groupby

解决方案


利用:

#filter only columns by Iteration with number
df = Input1.filter(regex='Iteration\d+$')
#get first mode
s = df.mode(axis=1).iloc[:, 0]
#compare df for all possible modes, add suffix for match errors columns, 
#last filter original with min
s1 = Input1.where(df.eq(s, axis=0).add_suffix('_error')).min(axis=1)

#add new columns
Output1 = Input1.assign(best_mode = s, best_error=s1)
print (Output1)
   Iteration1  Iteration1_error Iteration2  Iteration2_error Iteration3  \
I1         M2                96         M3                76         M3   
I2         M1                98         M1                88         M1   
I3         M3                34         M1                54         M1   
I4         M5                19         M5                12         M5   
I5         M4                22         M6                92         M6   
I6         M6                 9         M4                19         M4   

    Iteration3_error best_mode  best_error  
I1                66        M3        66.0  
I2                68        M1        68.0  
I3                84        M1        54.0  
I4                52        M5        12.0  
I5                72        M6        72.0  
I6                89        M4        19.0  

另一个想法是否可能使用pair和unpairs列(在数据中必须存在所有对ech other,排序):

df = Input1.iloc[:, ::2]
s = df.mode(axis=1).iloc[:, 0]
s1 = Input1.iloc[:, 1::2].where(df.eq(s, axis=0).to_numpy()).min(axis=1)

Output1 = Input1.assign(best_mode = s, best_error=s1)

推荐阅读