首页 > 解决方案 > R Boruta - 将数据框与列名确认的特征合并

问题描述

我在一个大型数据集(> 500 个协变量)上运行了一个 Boruta 算法,并且使用 得到了一个确认或拒绝特征的数据框,如下所示。每个观察都是我原始数据集中的一个特征(pred_recent_pf_bin)

boruta_pf_recent <- Boruta(pred_recent_pf_bin ~ . , data = pf_recent_use, doTrace = 2)
pf_recent_boruta_df <- attStats(boruta_pf_recent)
str(pf_recent_boruta_df)
'data.frame':   517 obs. of  6 variables:
 $ meanImp  : num  11.0438 0.0399 -0.3744 4.6134 -0.2527 ...
 $ medianImp: num  11.0482 0.0624 -0.6632 4.4585 -0.628 ...
 $ minImp   : num  8.62 -2.13 -1.24 3.34 -1.74 ...
 $ maxImp   : num  13.69 1.85 1.07 6.52 1.67 ...
 $ normHits : num  1 0 0 0.98 0 ...
 $ decision : Factor w/ 3 levels "Tentative","Confirmed",..: 2 3 3 2 3 3 3 3 3 3 ...

我已将数据框子集化为仅包含已确认的功能:

boruta_confirmed <- subset(pf_recent_boruta_df, subset = pf_recent_boruta_df$decision == "Confirmed")

然后转置它并清除算法统计信息

conf <- t(boruta_confirmed) conf_empty <- conf[-c(1:6), ]

所以现在我有一个没有观察的数据框和我原始数据集的列标题(pred_recent_pf_bin)。我想将原始数据集中的观察结果合并到conf_empty. 我尝试了各种 merge() 组合,但无法弄清楚应该配置哪种方式

str(conf_empty)
 chr[0 , 1:468] 
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:468] "houseID" "Bangsa" "farmWork" "BilaMasaSelalunyaKamuPergiTidur" ...

先感谢您!

标签: ralgorithmdataframefeature-selectiondata-wrangling

解决方案


想通了 - 以防它对其他人有用!

#subset to for confirmed and rejected variables 
boruta_confirmed <- subset(pf_recent_boruta_df, subset = pf_recent_boruta_df$decision == "Confirmed")
boruta_rejected <- subset(pf_recent_boruta_df, subset = pf_recent_boruta_df$decision == "Rejected")

#make empty dataframe of rejected column variable names
str(boruta_rejected)
rej <- t(boruta_rejected)
rej_empty <- rej[-c(1:6), ]

# get names of rejected columns
rej_names <- colnames(rej_empty[,2:49])
cols_rej <- c(rej_names)

#remove rejected columns from original dataframe
pf_recent_confirmed <- pf_recent[, !(colnames(pf_recent) %in% cols_rej)]


推荐阅读