r - 如何在交付/接收较早的订单之前知道下一个订单的客户?在 R 中
问题描述
我有一个有两个日期的大型数据库。例如,获取超市数据(http://www.tableau.com/sites/default/files/training/global_superstore.zip)“订单”表。
一个日期是订单日期,另一个是发货/交货日期(假设它是交货日期)。我想知道那些下一个订单的客户的所有订单的详细信息,而无需等待他们之前任何一个订单的发货/交付。
例如,ID 为“ZC-21910”的客户于 2014 年 6 月 12 日下订单,ID 为 CA-2014-133928,该订单于 2014 年 6 月 18 日发货。然而,同一客户下一个 ID 为“IT-2014-3511710”的订单2014 年 6 月 13 日,即 2014 年 6 月 18 日之前(早期订单之一的发货日期)。
最好在单独的数据框/表中过滤掉所有此类订单(订单 ID)。
我怎样才能在 R 中做到这一点?
示例数据集
> dput(df)
structure(list(customer_id = c("A", "A", "A", "B", "B", "C",
"C"), order_id = structure(1:7, .Label = c("1", "2", "3", "4",
"5", "6", "7"), class = "factor"), order_date = structure(c(17897,
17901, 17912, 17901, 17902, 17903, 17905), class = "Date"), ship_date = structure(c(17926,
17906, 17914, 17904, 17904, 17904, 17906), class = "Date")), row.names = c(NA,
-7L), class = c("tbl_df", "tbl", "data.frame"))
解决方案
以下是我在 R 中构建此工作流的方式,注意:在 Tableau 中复制功能将非常困难。
# Install pacakges if they are not already installed: necessary_packages => vector
necessary_packages <- c("readxl")
# Create a vector containing the names of any packages needing installation:
# new_pacakges => vector
new_packages <- necessary_packages[!(necessary_packages %in%
installed.packages()[, "Package"])]
# If the vector has more than 0 values, install the new pacakges
# (and their) associated dependencies:
if(length(new_packages) > 0){install.packages(new_packages, dependencies = TRUE)}
# Initialise the packages in the session:
lapply(necessary_packages, require, character.only = TRUE)
# Store a scalar of the link to the data: durl => character scalar
durl <- "http://www.tableau.com/sites/default/files/training/global_superstore.zip"
# Store the path to the temporary directory: tmpdir_path => character scalar
tmpdir_path <- tempdir()
# Store a character scalar denoting the link to the zipped directory
# that is to be created: zip_path => character scalar
zip_path <- paste0(tmpdir_path, "/tableau.zip")
# Store a character scalar denoting the link to the unzipped directory
# that is to be created: unzip_path => character scalar
unzip_path <- paste0(tmpdir_path, "/global_superstore")
# Download the zip file: global_superstore.zip => stdout (zip_path)
download.file(durl, zip_path)
# Unzip the file into the unzip directory: tableau.zip => stdout (global_superstore)
unzip(zipfile = zip_path, exdir = unzip_path)
# Read in the excel file: df => data.frame
df <- read_xls(normalizePath(list.files(unzip_path, full.names = TRUE)))
# Regex the vector names to fit with R convention: names(df) => character vector
names(df) <- gsub("\\W+", "_", tolower(trimws(names(df), "both")))
# Allocate some memory by creating an empty list the same size as the number of
# customers: df_list => list
df_list <- vector("list", length(unique(df$customer_id)))
# Split the data.frame into the list by the customer_id: df_list => lis
df_list <- with(df, split(df, customer_id))
# Sort the data (by date) and test whether or not each customer waited for their
# order before ordering again: orders_prior_to_delivery => data.frame
orders_prior_to_delivery <- data.frame(do.call("rbind", Map(function(x){
# Order the data.frame: y => data.frame
y <- x[order(x$order_date),]
# Return only the observations where the customer didn't wait:
# data.frame => GlobalEnv()
with(y, y[c(FALSE,
apply(data.frame(sapply(order_date[-1], `<`, ship_date[-nrow(y)])), 2, any)),])
},
df_list)), row.names = NULL, stringsAsFactors = FALSE)
# Unique customers and orders that were ordered prior to shipping the
# previous order: cust_orders_prior_to_delivery => data.frame
cust_orders_prior_to_delivery <-
unique(orders_prior_to_delivery[,c("order_id", "customer_id")])
推荐阅读
- javascript - 在 JSP 中看不到 Json 数据结果
- c++ - 从 .tsv 文件读入
- node.js - 从 Angular 和 Nodejs 请求 API 时出错
- python - 从python中的多行Excel工作表生成项目和子项目字段
- javascript - 为什么将 console.log() 放置在呈现视图的函数中时会重复打印一个变量?
- c# - 使用 JSON JavaScriptSerializer 在序列化或反序列化期间出现错误
- reactjs - React js CRUD - 添加数据时如何自动更新表而不刷新
- javascript - 在时间段之间获得整个三个月
- javascript - 无法在 jQuery / Ajax 中使用附加成功函数
- python - 捕获 Firebase 504 网关超时