首页 > 解决方案 > R dbplyr mysql column conversion

问题描述

I have a table in mySQL that looks something like this:

tbl<-tibble(
   Result=c("0.1","<0.0001","1.1"),
   Unit=c("mg/L","ug/L","mg/L"),
   Pref_Unit=c("mg/L","mg/L","mg/L"),
   Conversion=c(1,1000,1)
)

What I would like to do using dbplyr, pool, and RMariaDB is to convert the Result column to the preferred unit using the conversion factor in the table, while preserving the "<", and also splitting the Result column into a numeric fraction containing only the number and censored indicating whether the Result contained a "<".

With regular dplyr, I would do something like this:

tbl<-tbl %>%
    mutate(numb_Result=as.numeric(gsub("<","",Result)),
           cen_Result=grepl("<",Result)) %>%
    mutate(new_Result=ifelse(cen_Result,paste0("<",numb_Result*Conversion),paste0(numb_Result*Conversion)))

But that doesn't work with the database table. Any help would be appreciated.

标签: mysqlrdplyrdbplyr

解决方案


挑战很可能是因为 dbplyr 没有为gsub和定义翻译grepl。您可以在下面测试几种可能性:

library(dplyr)
library(dbplyr)

tbl<-tibble(
  Result=c("0.1","<0.0001","1.1"),
  Unit=c("mg/L","ug/L","mg/L"),
  Pref_Unit=c("mg/L","mg/L","mg/L"),
  Conversion=c(1,1000,1)
)

remote_table = tbl_lazy(tbl, con = simulate_mssql())

remote_table %>%
  mutate(has_sign = ifelse(substr(Result, 1, 1) == "<", 1, 0)) %>%
  mutate(removed_sign = ifelse(has_sign == 1, substr(Result, 2, nchar(Result)), Result)) %>%
  mutate(num_value = as.numeric(removed_sign)) %>%
  mutate(converted = as.character(1.0 * num_value * Conversion)) %>%
  mutate(new_Result = ifelse(has_sign, paste0("<",converted), converted))

ifelse, substr, nchar, as.numeric, as.character, 和有 dbplyr 翻译paste0。所以我希望这能奏效。但是,我不断收到错误,因为翻译器要求startandstop参数substr是常量,因此它不喜欢我nchar(Results)作为参数传递。但这可能会在更新版本的软件包中得到修复。

我的第二次尝试:

remote_table %>%
  mutate(has_sign = ifelse(substr(Result, 1, 1) == "<", 1, 0),
         character_length = nchar(Result),
         remove_first = sql(REPLACE(Result, "<", ""))) %>%
  mutate(removed_sign = ifelse(has_sign == 1, remove_first, Result)) %>%
  mutate(num_value = as.numeric(removed_sign)) %>%
  mutate(converted = as.character(1.0 * num_value * Conversion)) %>%
  mutate(new_Result = ifelse(has_sign, paste0("<",converted), converted))

这会产生预期的 SQL 翻译。但由于我使用的是模拟数据库连接,我无法测试它是否返回预期的输出。这种方法的缺点是它REPLACE直接使用 SQL 函数(它将未翻译的内容传递到 SQL 代码中),这不如完全翻译的解决方案优雅。

可能有更优雅的方法可以做到这一点。但希望在这两者之间你能找到一个合适的解决方案。


推荐阅读