首页 > 解决方案 > 用 dplyr 总结 - 一个变量总是在底部

问题描述

谁能帮我这个?我对多家公司的支出数据进行了分组和汇总,输出如下所示:

df <- data.frame(
    Column1 = c("Other", "Brand1", "Brand2", "Brand3", "Brand4", "Brand5"),
    Column2 = c(NA, "Subbrand1", "Subbrand2", "Subbrand3", "Subbrand4", "Subbrand5"),
    Spendings = c(1000, 500, 250, 200, 150, 100)
)

  Column1   Column2 Spendings
1   Other      <NA>      1000
2  Brand1 Subbrand1       500
3  Brand2 Subbrand2       250
4  Brand3 Subbrand3       200
5  Brand4 Subbrand4       150
6  Brand5 Subbrand5       100

“其他”行在顶部,但是我想要底部的特定列,因为以后的可视化(像这里)

df <- data.frame(
    Column1 = c("Brand1", "Brand2", "Brand3", "Brand4", "Brand5", "Other"),
    Column2 = c("Subbrand1", "Subbrand2", "Subbrand3", "Subbrand4", "Subbrand5", NA),
    Spendings = c(500, 250, 200, 150, 100, 1000)
)

  Column1   Column2 Spendings
1  Brand1 Subbrand1       500
2  Brand2 Subbrand2       250
3  Brand3 Subbrand3       200
4  Brand4 Subbrand4       150
5  Brand5 Subbrand5       100
6   Other      <NA>      1000

这是我用来用我想要的代码创建df的函数,它是obv。不起作用:-(。

df <- df%>%
    group_by(Column1, Column2) %>%
    summarise(Spendings = sum(Spendings)) %>%
    arrange(desc(Spendings), lastrow = "others")

有没有办法在 dplyr 工作流程的底部获得“其他”行?子集和 rbinding 当然是可能的,但是有没有更适合的方法?

标签: rdplyr

解决方案


We can use a logical vector on arrange and this would result in ordering based on alphabetical order i.e. FALSE comes before TRUE

df %>% 
   arrange(Column1 == "Other")
#  Column1   Column2 Spendings
#1  Brand1 Subbrand1       500
#2  Brand2 Subbrand2       250
#3  Brand3 Subbrand3       200
#4  Brand4 Subbrand4       150
#5  Brand5 Subbrand5       100
#6   Other      <NA>      1000

Another option is to create the column as factor with levels specified in that order so that 'Other' is the last level and if we arrange it would be do the order based on the levels. It might be a better option as it would also be maintained while doing the plot

un1 <- c(setdiff(unique(df$Column1), "Other"), "Other")
df %>%
    mutate(Column1 = factor(Column1, levels = un1)) %>%
    arrange(Column1)

if we use the forcats package, there are some useful functions fct_relevel to modify the levels easily

library(forcats)
df %>% 
  mutate(Column1 = fct_relevel(Column1, "Other", after = Inf)) %>% 
  arrange(Column1)

According to the examples in ?fct_relevel

Using 'Inf' allows you to relevel to the end when the number of levels is unknown or variable (e.g. vectorised operations)


推荐阅读