首页 > 解决方案 > 使用 R 访问时如何避免与 postgres 数据库建立多个连接

问题描述

我正在使用以下代码,但是它在调用 map 函数时创建了多个连接并且它们没有关闭。结果,我的 rds 数据库被连接淹没了。有什么办法可以更改此代码以防止出现如此多的连接?

   connect.to.database <- function (dbname, schema = "public", host, port, user, pass) {
      con <- dbConnect(RPostgres::Postgres(),
                       dbname = dbname,
                       user = user,
                       password = pass,
                       host = host,
                       port = port)
      
      
      # this puts the schema in the search path, which means that instead of
      # having to use <schema name>.<table name> you can just write <table name>
      res <- dbSendQuery(con, paste0("SET search_path TO ",
                                     dbQuoteIdentifier(con, schema),
                                     ", public"))
      
      # check for errors
      dbFetch(res)
      dbClearResult(res)
      
      con
    }

    schemas <- dbGetQuery(connect.to.database(dbname, "public", host, port, user, password), paste0("SELECT schema_name FROM information_schema.schemata"))
    
    schema_names <- schemas %>% pull()
    
    schemas_tables <- map(.x = schema_names,~dbGetQuery(connect.to.database(dbname, "public", host, port, user, password), paste0("SELECT table_name FROM information_schema.tables WHERE table_schema = ","'",.x,"'")) %>% mutate(schema_name = .x)) %>%
                      bind_rows()

标签: sqlrdbirpostgres

解决方案


创建单个全局连接对象map. paste0(我从您的第一个查询中删除了不必要的内容。)

conn <- connect.to.database(dbname, "public", host, port, user, password)
schema <- dbGetQuery(conn, "SELECT schema_name FROM information_schema.schemata")

schemas_tables <- map(
  .x = schema$schema_name,
  ~ dbGetQuery(conn, paste0("SELECT table_name FROM information_schema.tables WHERE table_schema = ","'",.x,"'")) %>%
    mutate(schema_name = .x)
) %>%
  bind_rows()

您可能需要考虑参数化查询,而不是手动构建查询字符串。虽然存在关于恶意SQL 注入的安全问题(例如,XKCD 的Exploits of a Mom又名“Little Bobby Tables”),但它也是对格式错误的字符串或 Unicode-vs-ANSI 错误的问题,即使它是单个数据分析师运行询问。DBI(with odbc) 和RODBC支持参数化查询,无论是本机还是通过附加组件。

这会将其更改为:

schemas_tables <- map(
  .x = schema$schema_name,
  ~ dbGetQuery(conn, "SELECT table_name FROM information_schema.tables WHERE table_schema = ?",
               params = list(.x)) %>%
    mutate(schema_name = .x)
) %>%
  bind_rows()

但坦率地说,我认为它可能更容易使用IN而不是=. 同样,使用参数绑定。

schemas_tables <- dbGetQuery(conn, "SELECT table_name FROM information_schema.tables WHERE table_schema IN (?)",
                             params = list(schema$schema_name))

(不需要map。)

或者我相信你可以在一个查询中完成,而不是两个。

dbGetQuery(conn, "
    select table_name
    from information_schema.tables
    where table_schema in (
      select schema_name from information_schema.schemata
    )")

记住

...完成后关闭连接。

dbDisconnect(conn)

推荐阅读