首页 > 解决方案 > 如何在不保存结果的情况下将一个 dplyr 链的结果用于另一个

问题描述

我希望有效地使用 dplyr 中的结构等子查询。例如我们做子查询,比如

select * from table where tbl_id in (select tbl_id from table1 where name='a');

在这里,我假设我们没有保存内部查询的任何结果。我想在 dplyr 链中使用类似的结构。

我试图通过将一个 dplyr 链的结果放在括号中来使用另一个链的结果,但它不是那样工作的。我已经知道我们可以将它保存为临时 df 并使用它,但我不想保存它。

下面是两个表数据/数据框:

# Libraries

library(sqldf)
#> Loading required package: gsubfn
#> Loading required package: proto
#> Loading required package: RSQLite
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union


# Q 16 Write a query in SQL to find the name of those 
# movies where one or more actors acted in two or more movies

movie <- read.csv("q14_movie.csv")
movie_cast <- read.csv("q16_movie_cast.csv")

print(movie)
#>    mov_id                 mov_title mov_year mov_time  mov_lang
#> 1     901                   Vertigo     1958      128   English
#> 2     902             The Innocents     1961      100   English
#> 3     903        Lawrence of Arabia     1962      216   English
#> 4     904           The Deer Hunter     1978      183   English
#> 5     905                   Amadeus     1984      160   English
#> 6     906              Blade Runner     1982      117   English
#> 7     907            Eyes Wide Shut     1999      159   English
#> 8     908        The Usual Suspects     1995      106   English
#> 9     909                 Chinatown     1974      130   English
#> 10    910             Boogie Nights     1997      155   English
#> 11    911                Annie Hall     1977       93   English
#> 12    912         Princess Mononoke     1997      134  Japanese
#> 13    913  The Shawshank Redemption     1994      142   English
#> 14    914           American Beauty     1999      122   English
#> 15    915                   Titanic     1997      194   English
#> 16    916         Good Will Hunting     1997      126   English
#> 17    917               Deliverance     1972      109   English
#> 18    918             Trainspotting     1996       94   English
#> 19    919              The Prestige     2006      130   English
#> 20    920              Donnie Darko     2001      113   English
#> 21    921       Slumdog Millionaire     2008      120   English
#> 22    922                    Aliens     1986      137   English
#> 23    923            Beyond the Sea     2004      118   English
#> 24    924                    Avatar     2009      162   English
#> 25    926             Seven Samurai     1954      207  Japanese
#> 26    927             Spirited Away     2001      125  Japanese
#> 27    928        Back to the Future     1985      116   English
#> 28    925                Braveheart     1995      178   English
#>      mov_dt_rel mov_rel_country
#> 1    1958-08-24              UK
#> 2    1962-02-19              SW
#> 3    1962-12-11              UK
#> 4    1979-03-08              UK
#> 5    1985-01-07              UK
#> 6    1982-09-09              UK
#> 7                            UK
#> 8    1995-08-25              UK
#> 9    1974-08-09              UK
#> 10   1998-02-16              UK
#> 11   1977-04-20             USA
#> 12   2001-10-19              UK
#> 13   1995-02-17              UK
#> 14                           UK
#> 15   1998-01-23              UK
#> 16   1998-06-03              UK
#> 17   1982-10-05              UK
#> 18   1996-02-23              UK
#> 19   2006-11-10              UK
#> 20                           UK
#> 21   2009-01-09              UK
#> 22   1986-08-29              UK
#> 23   2004-11-26              UK
#> 24   2009-12-17              UK
#> 25   1954-04-26              JP
#> 26   2003-09-12              UK
#> 27   1985-12-04              UK
#> 28   1995-09-08              UK

print(movie_cast)
#>    act_id mov_id                   role
#> 1     101    901  John Scottie Ferguson
#> 2     102    902           Miss Giddens
#> 3     103    903          T.E. Lawrence
#> 4     104    904                Michael
#> 5     105    905        Antonio Salieri
#> 6     106    906           Rick Deckard
#> 7     107    907          Alice Harford
#> 8     108    908                McManus
#> 9     110    910            Eddie Adams
#> 10    111    911            Alvy Singer
#> 11    112    912                    San
#> 12    113    913          Andy Dufresne
#> 13    114    914         Lester Burnham
#> 14    115    915    Rose DeWitt Bukater
#> 15    116    916           Sean Maguire
#> 16    117    917                     Ed
#> 17    118    918                 Renton
#> 18    120    920        Elizabeth Darko
#> 19    121    921            Older Jamal
#> 20    122    922                 Ripley
#> 21    114    923            Bobby Darin
#> 22    109    909            J.J. Gittes
#> 23    119    919          Alfred Borden



sqldf('select * from movie m join movie_cast mc on 
m.mov_id=mc.mov_id where mc.act_id in  
      (select act_id from movie_cast group by act_id having count(mov_id)>1)')
#>   mov_id        mov_title mov_year mov_time mov_lang   mov_dt_rel
#> 1    914  American Beauty     1999      122  English             
#> 2    923   Beyond the Sea     2004      118  English   2004-11-26
#>   mov_rel_country act_id mov_id            role
#> 1              UK    114    914  Lester Burnham
#> 2              UK    114    923     Bobby Darin



tmp <- movie_cast %>% 
        group_by(act_id) %>% 
          summarise(num_movies=n_distinct(mov_id)) %>% 
                                filter(num_movies>1) 

inner_join(movie,movie_cast,by='mov_id') %>% filter(act_id %in% tmp$act_id)
#>   mov_id        mov_title mov_year mov_time mov_lang   mov_dt_rel
#> 1    914  American Beauty     1999      122  English             
#> 2    923   Beyond the Sea     2004      118  English   2004-11-26
#>   mov_rel_country act_id            role
#> 1              UK    114  Lester Burnham
#> 2              UK    114     Bobby Darin


inner_join(movie,movie_cast,by='mov_id') %>% 
  filter(act_id %in% (movie_cast %>% 
group_by(act_id) %>% 
summarise(num_movies=n_distinct(mov_id)) %>% 
filter(num_movies>1) %>% 
select(act_id)))
#> [1] mov_id          mov_title       mov_year        mov_time       
#> [5] mov_lang        mov_dt_rel      mov_rel_country act_id         
#> [9] role           
#> <0 rows> (or 0-length row.names) 

我希望在不保存为 tmp 的情况下获得相同的结果,如代码中所述!

谢谢

标签: rdplyr

解决方案


推荐阅读