首页 > 解决方案 > run the script according to logical conditions in R

问题描述

In my data set, I work with groups(stratum) SKU-acnumber-year. Here little example:

df=structure(list(SKU = c(11202L, 11202L, 11202L, 11202L, 11202L, 
11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 
11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L
), stuff = c(8.85947691, 9.450108704, 10.0407405, 10.0407405, 
10.63137229, 11.22200409, 11.22200409, 11.81263588, 12.40326767, 
12.40326767, 12.40326767, 12.99389947, 13.58453126, 14.17516306, 
14.76579485, 15.94705844, 17.12832203, 17.71895382, 21.26274458, 
25.98779894, 63.19760196), action = c(0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L), 
    acnumber = c(137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L, 
    137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L, 
    137L, 137L, 137L), year = c(2018L, 2018L, 2018L, 2018L, 2018L, 
    2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 
    2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L)), .Names = c("SKU", 
"stuff", "action", "acnumber", "year"), class = "data.frame", row.names = c(NA, 
-21L))

Very important:

The action column has only two values 0 and 1. As we can see in this example there is 3 observations by stuff of 1 category of action and 18 obs by stuff of zero category.

I need set logic condition. so, for groups which have from 1 to 4 observations by stuff of 1 category of action, then run script1.r

and for groups which have >=5 observations by stuff of 1 category of action then must be run script2.r

I would imagine it in this way, the script3.r is created, with the following content(condition), but I do not know how to correctly set these logical conditions.

# i take data from sql
dbHandle <- odbcDriverConnect("driver={SQL Server};server=;database=;trusted_connection=true")
sql <- paste0(select needed columns)
df <- sqlQuery(dbHandle, sql)



   for groups where from 1-4  observations by stuff of 1 category of action then  C:/path to/скрипт1.r
(or if  groups have from 1-4  observations by stuff of 1 category of action then  C:/path to/script1.r)
    for  groups   where >=5 observations by stuff of 1 category of action then C:/path to/script2.r
( of if groups  have >=5 observations by stuff of 1 category of action then C:/path to/script2.r)

How do I implement this? the script.3r runs by schedule, it will work according to the schedule, in order to run two scripts. I just do not want to make my Shedule for each script seaprately.

标签: rif-statementodbcrodbc

解决方案


Consider if logic inside by, the method to slice a dataframe by factor(s). And run the other scripts via command line with system() calling Rscript (assuming the R bin directory is set to your PATH environment variable):

by_list <- by(df, df[,c("SKU", "acnumber", "year")], function(sub) {

  if (sum(sub$action == 1) %in% c(1:4))   system("Rscript /path/to/script1.r")
  if (sum(sub$action == 1) >= 5)          system("Rscript /path/to/script2.r")

  return(sub)
})

Even better, source() the external scripts in main script, making sure to wrap entire process of both scripts in function() calls, even adding arguments like specific SKU. Otherwise,source will run those files. With this approach, you can return output.

source("/path/to/script1.r")   # IMPORTS script1_function()
source("/path/to/script2.r")   # IMPORTS script2_function()

by_list <- by(df, df[,c("SKU", "acnumber", "year")], function(sub) {

  current_SKU <- max(sub$SKU)   # OR min(sub$SKU) OR sub$SKU[[1]]

  if (sum(sub$action == 1) %in% c(1:4))  output <- script1_function()
  if (sum(sub$action == 1) >= 5)         output <- script2_function()

  return(output)
})

推荐阅读