首页 > 解决方案 > avoid duplicated column headers pandas dataframe creation

问题描述

Im a beginner trying to create dataframe that is storing some model perfomances(R², RMSE, training time, predic time...etc, below an example. but the result is a dataframe with repeated column headers. could you please help me to avoid this? the objective is to have all the df with one header only... The issue must come from the 'for loop' but I'm not sure how to fix it. Thanks

models = [LinearRegression(), Ridge(), Lasso(),ElasticNet()]    
for model in models:  
      start = time.time()
      model.fit(X_train, y_train)
      stop = time.time()
      start1 = time.time()
      predictions = model.predict(X_train)
      stop1 = time.time() 
      results={'Model':type(model).__name__, 'R²_score':r2_score(y_train, predictions),'RMSE': 
      mean_squared_error(y_train, predictions),'AB_Av_ERR':mean_absolute_error(y_train, predictions),'Training_time':stop-start,'Pred_time':stop1- 
     start1}
      df_res = pd.DataFrame(results,index=[0])
      print(df_res)

here's the output:

>    Model  R²_score    RMSE  AB_Av_ERR  Training_time  Pred_time 0 
> LinearRegression      0.01 1736.28      21.28           0.86      
> 0.07    Model  R²_score    RMSE  AB_Av_ERR  Training_time  Pred_time 0  Ridge      0.01 1736.28      21.28           0.32       0.08    Model 
> R²_score    RMSE  AB_Av_ERR  Training_time  Pred_time 0  Lasso     
> 0.01 1740.02      21.26           0.99       0.08
>         Model  R²_score    RMSE  AB_Av_ERR  Training_time  Pred_time 0  ElasticNet      0.01 1740.14      21.28           0.89       0.08

标签: pythonpandas

解决方案


You can create an empty dataframe first outside the loop:

df = pd.DataFrame(columns=['Model', 'R²_score', 'RMSE', 'AB_Av_ERR', 'Training_time', 'Pred_time']

Then append values inside the loop like so:

df = df.append(results, ignore_index=True)

Try this:

models = [LinearRegression(), Ridge(), Lasso(),ElasticNet()]   
df_res = pd.DataFrame(columns=['Model', 'R²_score', 'RMSE', 'AB_Av_ERR', 'Training_time', 'Pred_time'] 
for model in models:  
    start = time.time()
    model.fit(X_train, y_train)
    stop = time.time()
    start1 = time.time()
    predictions = model.predict(X_train)
    stop1 = time.time() 
    results={'Model':type(model).__name__, 'R²_score':r2_score(y_train, predictions),'RMSE': 
    mean_squared_error(y_train, predictions),'AB_Av_ERR':mean_absolute_error(y_train, predictions),'Training_time':stop-start,'Pred_time':stop1- 
 start1}
    df_res = df_res.append(results, ignore_index=True)
    
print(df_res)

推荐阅读