首页 > 解决方案 > 如何根据 R 中列中的最小值选择具有 dplyr/tidyvese 的列

问题描述

我有一个每个点的 Landcoverpixel 某些计数的数据集。

    species_distr <- data.frame(structure(list(Point = c(101, 102, 103, 104, 105, 106), `Herbaceous cover` = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), `Tree or shrub cover` = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), `Cropland, irrigated or post-flooding` = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), `Mosaic cropland (>50%) / natural vegetation (tree, shrub, herbaceous cover) (<50%)` = c(NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
), `Mosaic natural vegetation (tree, shrub, herbaceous cover) (>50%) / cropland (<50%)` = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), `Tree cover, broadleaved, evergreen, closed to open (>15%)` = c(NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
), `Tree cover, broadleaved, deciduous, closed to open (>15%)` = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), `Tree cover, broadleaved, deciduous, closed (>40%)` = c(NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
), `Tree cover, broadleaved, deciduous, open (15-40%)` = c(NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
), `Tree cover, needleleaved, evergreen, closed to open (>15%)` = c(NA, 
NA, 1.73725490196078, NA, NA, NA), `Tree cover, needleleaved, evergreen, closed (>40%)` = c(NA, 
NA, 0L, NA, NA, NA), `Tree cover, needleleaved, evergreen, open (15-40%)` = c(NA, 
NA, 0L, NA, NA, NA), `Tree cover, needleleaved, deciduous, closed to open (>15%)` = c(2059.57647058824, 
544, 2209.63529411765, 1226.7568627451, 1722.34901960784, 1359.10196078432
), `Tree cover, needleleaved, deciduous, closed (>40%)` = c(NA, 
NA, 0L, 0L, NA, NA), `Tree cover, needleleaved, deciduous, open (15-40%)` = c(NA, 
NA, 0L, 0L, NA, NA), `Tree cover, mixed leaf type (broadleaved and needleleaved)` = c(NA, 
NA, 1.96470588235294, 0, NA, NA), `Mosaic tree and shrub (>50%) / herbaceous cover (<50%)` = c(NA, 
NA, 0, 2, NA, NA), `Mosaic herbaceous cover (>50%) / tree and shrub (<50%)` = c(NA, 
NA, 0L, NA, NA, NA), Shrubland = c(NA, NA, 0, NA, NA, NA), `Shrubland evergreen` = c(NA, 
NA, 0L, NA, NA, NA), `Shrubland deciduous` = c(NA, NA, 0, NA, 
NA, NA), Grassland = c(NA, NA, 0L, NA, NA, NA), `Lichens and mosses` = c(NA, 
NA, 0L, NA, NA, NA), `Sparse vegetation (tree, shrub, herbaceous cover) (<15%)` = c(NA, 
NA, 0, NA, NA, NA), `Sparse tree (<15%)` = c(NA, NA, 0L, NA, 
NA, NA), `Sparse shrub (<15%)` = c(NA, NA, 0L, NA, NA, NA), `Sparse herbaceous cover (<15%)` = c(NA, 
NA, 0L, NA, NA, NA), `Tree cover, flooded, fresh or brakish water` = c(NA, 
NA, 0, NA, NA, NA), `Tree cover, flooded, saline water` = c(NA, 
NA, 0L, NA, NA, NA), `Shrub or herbaceous cover, flooded, fresh/saline/brakish water` = c(NA, 
NA, 0, NA, NA, NA), `Urban areas` = c(NA, NA, 0L, NA, NA, NA), 
    `Bare areas` = c(NA, NA, 0, NA, NA, NA), `Consolidated bare areas` = c(NA, 
    NA, 0L, NA, NA, NA), `Unconsolidated bare areas` = c(NA, 
    NA, 0L, NA, NA, NA), `Water bodies` = c(NA, NA, 4.73725490196078, 
    NA, NA, NA)), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame")))

如果要排除所有值不超过例如 50 的列。我快速而肮脏的解决方案是:

c <- NULL
for (i in 2:length(species_distr)) {
  if (max(na.omit(species_distr[,i])) > 50) {
    c <- c(c, i)
  }
}
species_distr_plot <- species_distr[,c(1,c)]

我如何使用 dplyr/tidyverse 实现这一目标?到目前为止我试过:

  %>%
select_if(na.omit(max(.)) > 50)

标签: rdplyrdata-science

解决方案


我们可能需要any

library(dplyr)
species_distr %>% 
     select_if(~ !any(na.omit(max(.x)) > 50))

推荐阅读