sql - R summarise_at 动态地按条件:某些列的平均值,其他列的总和
问题描述
我愿意,但条件是summarise_at()
编辑#1:我在标题中动态添加了这个词:当我在其中使用vars(c())
它时,summarise_at()
它是为了快速清晰的示例,但实际上它是为了使用contains()
,starts_with()
和matches(,, perl=TRUE)
,因为我有 50 列,有很多sum()
和一些mean()
.
目标是使用tbl()..%>% group_by() ... %>% summarise_at()...%>% collect()
.
编辑#2:我添加了第二个示例中生成的 SQL 示例
library(tidyverse)
(mtcars
%>% group_by(carb)
%>% summarise_at(vars(c("mpg","cyl","disp")), list (~mean(.),~sum(.)))
# I don't want this line below, I would like a conditional in summarise_at() because I have 50 columns in my real case
%>% select(carb,cyl_mean,disp_mean,mpg_sum)
)
#> # A tibble: 6 x 4
#> carb cyl_mean disp_mean mpg_sum
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 4.57 134. 177.
#> 2 2 5.6 208. 224
#> 3 3 8 276. 48.9
#> 4 4 7.2 309. 158.
#> 5 6 6 145 19.7
#> 6 8 8 301 15
Created on 2020-02-19 by the reprex package (v0.3.0)
这可行,但我只想要 mpg 的总和,并且只意味着 cyl 和 disp:
library(RSQLite)
library(dbplyr)
library(tidyverse)
library(DBI)
db <- dbConnect(SQLite(),":memory:")
dbCreateTable(db, "mtcars_table", mtcars)
(tbl( db, build_sql( con=db,"select * from mtcars_table" ))
%>% group_by(carb)
%>% summarise_at(vars(c("mpg","cyl","disp")), list (~mean(.),~sum(.)))
%>% select(carb,cyl_mean,disp_mean,mpg_sum)
%>% show_query()
)
#> <SQL>
#> Warning: Missing values are always removed in SQL.[...] to silence this warning
#> SELECT `carb`, `cyl_mean`, `disp_mean`, `mpg_sum`
#> FROM (SELECT `carb`, AVG(`mpg`) AS `mpg_mean`, AVG(`cyl`) AS `cyl_mean`, AVG(`disp`) AS `disp_mean`, SUM(`mpg`) AS `mpg_sum`, SUM(`cyl`) AS `cyl_sum`, SUM(`disp`) AS `disp_sum`
#> FROM (select * from mtcars_table)
#> GROUP BY `carb`)
#> # Source: lazy query [?? x 4]
#> # Database: sqlite 3.30.1 [:memory:]
#> # … with 4 variables: carb <dbl>, cyl_mean <lgl>, disp_mean <lgl>,
#> # mpg_sum <lgl>
我尝试了所有这样的可能性,但它不起作用或产生错误。
(mtcars %>% group_by(carb)%>% summarise_at(vars(c("mpg","cyl","disp")),ifelse(vars(contains(names(.),"mpg")),list(sum(.)),list(mean(.)))) )
不好,列太多
library(tidyverse)
(mtcars %>% group_by(carb)%>% summarise_at(vars(c("mpg","cyl","disp")),ifelse ((names(.)=="mpg"), list(~sum(.)) , list(~mean(.)))))
#> # A tibble: 6 x 34
#> carb mpg_sum cyl_sum disp_sum mpg_mean..2 cyl_mean..2 disp_mean..2
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 177. 32 940. 25.3 4.57 134.
#> 2 2 224 56 2082. 22.4 5.6 208.
#> 3 3 48.9 24 827. 16.3 8 276.
#> 4 4 158. 72 3088. 15.8 7.2 309.
#> 5 6 19.7 6 145 19.7 6 145
#> 6 8 15 8 301 15 8 301
#> # … with 27 more variables: mpg_mean..3 <dbl>, cyl_mean..3 <dbl>,
#> # disp_mean..3 <dbl>, mpg_mean..4 <dbl>, cyl_mean..4 <dbl>,
#> # disp_mean..4 <dbl>, mpg_mean..5 <dbl>, cyl_mean..5 <dbl>,
#> # disp_mean..5 <dbl>, mpg_mean..6 <dbl>, cyl_mean..6 <dbl>,
#> # disp_mean..6 <dbl>, mpg_mean..7 <dbl>, cyl_mean..7 <dbl>,
#> # disp_mean..7 <dbl>, mpg_mean..8 <dbl>, cyl_mean..8 <dbl>,
#> # disp_mean..8 <dbl>, mpg_mean..9 <dbl>, cyl_mean..9 <dbl>,
#> # disp_mean..9 <dbl>, mpg_mean..10 <dbl>, cyl_mean..10 <dbl>,
#> # disp_mean..10 <dbl>, mpg_mean..11 <dbl>, cyl_mean..11 <dbl>,
#> # disp_mean..11 <dbl>
其他一些尝试和评论:我希望有条件sum(.)
或mean(.)
取决于summarise()
.
如果它不仅接受原始函数,那就太好了。
最后,使用andtbl()..%>% group_by() ... %>% summarise_at()...%>% collect()
生成条件 SQL 。AVG()
SUM()
T-SQL 函数类似~(convert(varchar())
适用于mutate_at()
和类似~AVG()
适用于,summarise_at()
但我到达同一点:summarise_at()
根据列的名称,条件不起作用。
:)
解决方案
一个选项是group_by
“carb”,然后将sum
“mpg”创建为另一个分组变量,然后summarise_at
与所需的其余变量一起使用
library(dplyr)
mtcars %>%
group_by(carb) %>%
group_by(mpg_sum = sum(mpg), .add = TRUE) %>%
summarise_at(vars(cyl, disp), list(mean = mean))
# A tibble: 6 x 4
# Groups: carb [6]
# carb mpg_sum cyl_mean disp_mean
# <dbl> <dbl> <dbl> <dbl>
#1 1 177. 4.57 134.
#2 2 224 5.6 208.
#3 3 48.9 8 276.
#4 4 158. 7.2 309.
#5 6 19.7 6 145
#6 8 15 8 301
或者使用 的devel
版本dplyr
,这可以summarise
通过将列块across
和单列自己包装并在其上应用不同的功能来完成
mtcars %>%
group_by(carb) %>%
summarise(across(one_of(c("cyl", "disp")), list(mean = mean)),
mpg_sum = sum(mpg))
# A tibble: 6 x 4
# carb cyl_mean disp_mean mpg_sum
# <dbl> <dbl> <dbl> <dbl>
#1 1 4.57 134. 177.
#2 2 5.6 208. 224
#3 3 8 276. 48.9
#4 4 7.2 309. 158.
#5 6 6 145 19.7
#6 8 8 301 15
注意:在即将发布的版本中, etc. 将被具有默认功能 ( )summarise_at/summarise_if/mutate_at/mutate_if/...
的动词取代across
summarise/mutate/filter/...
推荐阅读
- javascript - 在第一次渲染时开始倒计时 N 个项目
- openstack - 目标 WSGI 脚本 '/usr/share/openstack-dashboard/openstack_dashboard/wsgi.py' 不能作为 Python 模块加载
- c++ - 如何在 C++ 中更新记录文件?
- java - Java 未正确删除数据库数据
- python - 从数据库实例 Flask 创建 API 端点
- assembly - x86 汇编中的除法导致程序暂停程序
- regex - GROK 正则表达式捕获组不匹配
- restructuredtext - 重构文本中的左对齐图形指令
- mongodb - 如何使聚合返回帖子的其余部分?
- list - 如何更改动态生成的项目列表的各个状态?