首页 > 解决方案 > 在给定条件下跨多列选择值

问题描述

我正在尝试选择 2011 年后对应月份的所有年度值,如果行的大小不匹配,则用 NA 填充它们。

我努力了:

rank_data %>% filter(across(starts_with('year'))>2011)

仅返回单列的输出:

  jan year   feb year2   mar year3   apr year4   may year5   jun year6   jul year7   aug year8   sep year9   oct year10   nov year11   dec
1: 205.3 2014 188.2  2014 167.8  1920 122.1  1882 134.2  1906 152.0  1948 173.6  1882 191.3  1985 196.5  1957 234.6   1998 224.2   1931 232.6
2: 203.2 2016 179.0  1950 162.9  1868 119.5  1993 133.4  1925 147.3  1980 156.4  1932 181.1  1900 193.6  1927 211.6   1862 222.5   1928 230.1
   year12   win year13   spr year14   sum year15   aut year16    ann year17
1:   1979 529.4   1960 348.9   1983 438.8   1928 527.8   1944 1547.8   2008
2:   1994 522.5   1994 340.4   1903 426.6   2007 519.6   1923 1530.2   2015

可重现的代码:

structure(list(jan = c(268.1, 263.1, 235.2, 223.3, 219.2, 218.3
), year = c(1928, 1948, 2008, 1877, 1995, 1990), feb = c(287.6, 
241.9, 213.7, 205.1, 191.9, 191.2), year2 = c(2020, 2002, 1997, 
1990, 1958, 1923), mar = c(225.3, 190.7, 187.8, 187.2, 175.9, 
173.9), year3 = c(1981, 1903, 2019, 1947, 1994, 1912), apr = c(147.4, 
147.1, 143.1, 138.1, 132.4, 128.4), year4 = c(1920, 1867, 1970, 
1913, 1947, 2012), may = c(166.5, 161.1, 153.4, 142.1, 141.8, 
139.1), year5 = c(1967, 1924, 1886, 1920, 2015, 1993), jun = c(205.6, 
173.7, 172.5, 170.6, 161.1, 161.1), year6 = c(2012, 1928, 2007, 
1907, 1872, 1998), jul = c(196.1, 191.4, 183.6, 180.7, 180.3, 
178.9), year7 = c(1939, 1888, 1920, 1988, 2009, 1880), aug = c(260.9, 
231, 209.7, 205.4, 203.9, 197.7), year8 = c(1956, 1917, 1891, 
1927, 1879, 2004), sep = c(287.1, 238.1, 223.3, 208.8, 207.6, 
206.2), year9 = c(1918, 1950, 1869, 1935, 1866, 1872), oct = c(286.1, 
278.8, 263.8, 259.6, 249.9, 248.4), year10 = c(1967, 1903, 1954, 
2000, 1938, 1870), nov = c(306.7, 260.9, 253.6, 252.6, 242.3, 
235.8), year11 = c(2009, 2015, 1929, 2000, 1951, 1954), dec = c(343.1, 
266, 246.8, 246.5, 238.9, 238), year12 = c(2015, 1993, 1868, 
1986, 1929, 2006), win = c(688.7, 625.9, 582.1, 560, 558.4, 546.6
), year13 = c(2016, 1995, 2014, 1990, 2020, 1877), spr = c(457.3, 
408.9, 375.8, 372.8, 371.8, 371.8), year14 = c(1920, 1947, 1979, 
2006, 1981, 1913), sum = c(499.7, 489.5, 483.8, 468.6, 452.1, 
446.9), year15 = c(1879, 2012, 1956, 1912, 1927, 2020), aut = c(673, 
668.7, 580.8, 567.9, 560.8, 554.5), year16 = c(2000, 1954, 1872, 
1935, 1903, 1981), ann = c(1758.2, 1691.1, 1690, 1660.7, 1648.5, 
1624.6), year17 = c(1872, 1954, 2000, 1877, 1903, 2012)), row.names = c(NA, 
6L), class = "data.frame")

预期输出:

 jan   year feb    year2  mar    year3        ...
 205.3 2014 287.6  2020   187.8  2019         ...
 203.2 2016 188.2  2014    NA     NA          ...

标签: r

解决方案


也许,我们可以使用索引来替换相应“年份”值小于或等于 2011 年的值

i1 <- grep("^year", names(rank_data))
i2 <- i1 -1
tmp <- NA^(rank_data[i1] <= 2011)
rank_data[i2]  <-  rank_data[i2] * tmp 
rank_data[i1] <- rank_data[i1] * tmp

如果我们要删除所有NA与“年份”相关的列

i3 <- rep(sapply(rank_data[i2], function(x) any(!is.na(x))), each = 2)
lst1 <- lapply(rank_data[i3], function(x) x[complete.cases(x)])
mx <- max(lengths(lst1))
do.call(cbind, lapply(lst1, `length<-`, mx))

-输出

#  feb year2   mar year3   apr year4   may year5   jun year6   nov year11   dec year12   win year13   sum year15    ann year17
#[1,] 287.6  2020 187.8  2019 128.4  2012 141.8  2015 205.6  2012 260.9   2015 343.1   2015 688.7   2016 489.5   2012 1624.6   2012
#[2,]    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA     NA    NA     NA 582.1   2014 446.9   2020     NA     NA
#[3,]    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA     NA    NA     NA 558.4   2020    NA     NA     NA     NA 

推荐阅读