首页 > 解决方案 > 滚动平均值,dbplyr 中的标准偏差

问题描述

我想在 dbplyr 中设置一个带有滚动函数的新变量(滚动平均值、stdev...等)

这是一个数据库

library(odbc)
library(DBI)
library(tidyverse)
library(zoo)

con <- DBI::dbConnect(odbc::odbc(),
                      Driver    = "SQL Server", 
                      Server    = "xx.xxx.xxx.xxx",
                      Database  = "stock",
                      UID       = "userid",
                      PWD       = "userpassword")

startday = 20150101
day = tbl(con, in_schema("dbo", "LogDay")) 

在此处输入图像描述

我想计算 5 天内的滚动平均值,这是我的代码,但它不起作用

我怎么解决这个问题?

library(zoo)    
day %>% 
      mutate(ma5 = rollmean(priceClose, k = 5, fill = NA))

error: nanodbc/nanodbc.cpp:1655: 42000: [Microsoft][ODBC SQL Server Driver][SQL Server]키워드 'AS' 근처의 구문이  [Microsoft][ODBC SQL Server Driver][SQL Server]문을 준비할 수 
    <SQL> 'SELECT TOP 11 "logNo", "stockCode", "logDate", "priceOpen", "priceHigh", "priceLow", "priceClose", "adjRate", "volume", "amount", "numListed", "remark", "marketCap", "foreignRate", "personNetbuy", "foreignNetbuy", "instNetbuy", "financeNetbuy", "insuranceNetbuy", "toosinNetbuy", "bankNetbuy", "gitaFinanceNetbuy", "pensionNetbuy", "gitaInstNetbuy", "gitaForeignNetbuy", "samoNetbuy", "nationNetbuy", rollmean("priceClose", 5.0 AS "k", NULL AS "fill") AS "ma5"
    FROM "dbo"."LogDay"
    WHERE ("logDate" > 20150101.0)
    ORDER BY "stockCode"'
    Warning : 
    Named arguments ignored for SQL rollmean

标签: rmeanrolling-computationdbplyr

解决方案


发生错误是因为rollmean没有定义 dbplyr 翻译,也不是无需翻译即可使用的 SQL 命令。这并不奇怪,因为rollmean它是 data.table 库的一部分,并且 dbplyr 专注于翻译 dplyr 和 base R 命令。

您所追求的一部分是窗口功能。dplyr 有一系列窗口函数,SQL 也是如此,但这些函数之间的转换并不总是直截了当的。但是有一些方法可以使用定义了翻译的命令来做到这一点。

两种可能的考虑方法:

(1) 滞后与领先相结合

df %>%
  mutate(prev2_price = lag(priceClose, 2, order_by = date),
         prev1_price = lag(priceClose, 1, order_by = date),
         next1_price = lead(priceClose, 1, order_by = date),
         next2_price = lead(priceClose, 2, order_by = date)) %>%
  mutate(ma5 = (prev2_price + prev1_price + priceClose + next1_price + next2_price) / 5)

这种方法不能很好地扩展,但它很简单,很容易推理。如果您想在组内工作(例如,为每只股票单独移动平均线),请group_by在使用之前应用laglead

(2)加入并过滤掉不需要的记录

df2 = df %>%
  select(stockCode, date, priceClose)

df %>%
  inner_join(df2, by = "stockCode", suffix = c("","_2") %>%
  filter(abs(date - date_2) <= 2) %>% # two records either side = window of width 5
  group_by(stockCode, date, priceClose) %>%
  summarise(ma5 = mean(priceClose_2)

这种方法更通用,但可能更难推理。


推荐阅读