首页 > 解决方案 > 这个 SQL 查询和连接是如何在 dplyr 中完成的?

问题描述

我正在尝试学习一点 R,但不知道如何使用 dplyr 加入 2 个表。如果我在 SQL 中严格执行此操作,我会使用:

select a1.[some column], a2.[other column], a3.[last column]
from vwTable1 a1
join vwTable2 a2
    on a2.myKey = a1.mykey
join vwTable3 a3
    on a3.otherKey = a1.otherKey
where a2.myItem = '12345'
    and a1.myDate between '1/1/2021' and '3/31/2021'

在 R 中,我可以连接到 SQL 数据库:

library(odbc)
library(dplyr)

con <- dbConnect(odbc(),
    Driver = "SQL Server",
    Server = "myServer",
    Database = "myDatabase",
    Port = "1234",
    Trusted_Connection = "Yes")

我只是不确定是否需要先建立参考,例如:

a1 <- tbl(con, "vwTable1")
a2 <- tbl(con, "vwTable2") 
a3 <- tbl(con, "vwTable3")

...然后加入 2 个参考?

目标是将查询结果放入数据框中(如果这是最好的方法),该数据框将被复制到另一台服务器上的 SQL 表中。

我确实尝试了以下操作,但 R Console 只是坐在 + 处等待我做其他事情:

a4 <- tbl(con, sql("SELECT a1.[some column], a2.[other column], a2.[last column]
    FROM vwTable1 a1
    JOIN vwTable2 a2 on a2.myKey = a1.myKey
    JOIN vwTable3 a3 on a3.otherKey = a1.otherKey
    WHERE a2.myItem = '12345'
        AND a1.myDate BETWEEN '1/1/2021' AND '3/31/2021'))

编辑:

这是解决方案:

results <- tbl(con, "vwTable1") %>% filter(myDate >= "2021-01-01" & myDate <= "2021-03-31") %>%
inner_join(tbl(con, "vwTable2") %>% filter(myItem == '12345'), by = "myKey") %>%
inner_join(tbl(con, "vwTable3"), by = "otherKey")

标签: sqlrdplyr

解决方案


dplyr 语法类似于

tbl(con, "vwTable1") %>%
  filter(myDate >= "2021-01-01" & myDate <= "2021-01-02") %>%  # edit
  inner_join(tbl(con, "vwTable2") %>%
            filter(myItem == '12345'), 
            by = "myKey") %>%
  inner_join(tbl(con, "vwTable3"), by = "otherKey")

  # Edit - I think the default is inner_join, not left_join as I had originally

推荐阅读