首页 > 解决方案 > 如何在数据资源管理器的 plot_missing() 函数中更改颜色和波段标签

问题描述

如果其他人尝试在DataExplorer'splot_missing()函数中自定义颜色和硬编码波段标签,这里有一个简单的方法。

默认输出

# To show all bands I've replaced `species` column in the `starwars` dataset with NA
data(starwars)
df <- starwars
df$species <- NA

library(DataExplorer)
plot_missing(df)

在此处输入图像描述

如您所见,生成的图形具有按字母顺序排列的带标签“坏、好、好、删除”,这与协调颜色不太匹配。例如,“删除”颜色是紫色,而红色最有意义(由于使用了ggplot2的默认颜色)。

标签: rggplot2

解决方案


为了使输出plot_missing图更清晰,您可以更改根函数plot_missing()并将其分配给另一个变量 ( plot_missing_2)。

# Original function
function (data, group = list(Good = 0.05, OK = 0.4, Bad = 0.8, 
  Remove = 1), geom_label_args = list(), title = NULL, ggtheme = theme_gray(), 
theme_config = list(legend.position = c("bottom"))) 
{
  pct_missing <- Band <- NULL
  missing_value <- data.table(profile_missing(data))
  group <- group[sort.list(unlist(group))]
  invisible(lapply(seq_along(group), function(i) {
    if (i == 1) {
      missing_value[pct_missing <= group[[i]], `:=`(Band, 
        names(group)[i])]
    } else {
      missing_value[pct_missing > group[[i - 1]] & pct_missing <= 
         group[[i]], `:=`(Band, names(group)[i])]
    }
}))
  output <- ggplot(missing_value, aes_string(x = "feature", 
    y = "num_missing", fill = "Band")) + geom_bar(stat = "identity") + 
    scale_fill_discrete("Band") + coord_flip() + xlab("Features") + 
    ylab("Missing Rows")
  geom_label_args_list <- list(mapping = aes(label = paste0(round(100 * 
    pct_missing, 2), "%")))
  output <- output + do.call("geom_label", c(geom_label_args_list, 
    geom_label_args))
  class(output) <- c("single", class(output))
  plotDataExplorer(plot_obj = output, title = title, ggtheme = ggtheme, 
    theme_config = theme_config)
}

主要是更改group = list(Good = 0.05, OK = 0.4, Bad = 0.8, Remove = 1)group = list(Good = 0.05, Okay = 0.4, Poor = 0.8, Scarce = 1)和。您可以根据自己的喜好设置自己的组,只需记住乐队的显示顺序是按字母顺序排列的(尚未修改)。您也可以将颜色更改为您喜欢的任何颜色。只需记住将函数分配给一个新变量,例如.scale_fill_discrete("Band")scale_fill_manual("Band", values = c("Good"="green2","Okay"="gold","Poor"="darkorange","Scarce"="firebrick2"))plot_missing()plot_missing_2

定制功能

# Custom function
plot_missing_2 <-
function (data, group = list(Good = 0.05, Okay = 0.4, Poor = 0.8, 
  Scarce =  1), geom_label_args = list(), title = NULL, ggtheme = theme_gray(), 
theme_config = list(legend.position = c("bottom"))) 
{
  pct_missing <- Band <- NULL
  missing_value <- data.table(profile_missing(data))
  group <- group[sort.list(unlist(group))]
  invisible(lapply(seq_along(group), function(i) {
    if (i == 1) {
      missing_value[pct_missing <= group[[i]], `:=`(Band,
         names(group)[i])]
    } else {
  missing_value[pct_missing > group[[i - 1]] & pct_missing <= 
     group[[i]], `:=`(Band, names(group)[i])]
    }
}))
  output <- ggplot(missing_value, aes_string(x = "feature", 
    y = "num_missing", fill = "Band")) + geom_bar(stat = "identity") + 
   scale_fill_manual("Band", values = c("Good"="green2","Okay"="gold","Poor"="darkorange","Scarce"="firebrick2")) + coord_flip() + xlab("Features") + 
   ylab("Missing Rows")
  geom_label_args_list <- list(mapping = aes(label = paste0(round(100 * 
    pct_missing, 2), "%")))
  output <- output + do.call("geom_label", c(geom_label_args_list, 
     geom_label_args))
  class(output) <- c("single", class(output))
  plotDataExplorer(plot_obj = output, title = title, ggtheme = ggtheme, 
   theme_config = theme_config)
}

定制输出

data(starwars)
df <- starwars
df$species <- NA  

library(DataExplorer)  
plot_missing_2(df)

在此处输入图像描述


推荐阅读