首页 > 解决方案 > 如果 R 中的中位数为零,则脚本不会计算中位数

问题描述

我有剧本

library(dplyr)
newest=mydat %>% filter(SaleCount > 0) %>%  #First filter out for SaleCount > 0 which of our interest
  group_by(CustomerName,ItemRelation, DocumentNum, DocumentYear,CustomerType) %>%
  mutate(k = mean(SaleCount[IsPromo==1]),
         m0 = median(tail(SaleCount[IsPromo==0], 5))) %>%  # Calculate m and m0 for all rows
  filter(IsPromo == 1) %>%  # Now keep only rows with IsPromo == 1

   mutate(r = (k-m0)*n())  %>% distinct()

这个脚本

1. calculate mean value for salecount for 1 category of Ispromo
(without negative value and zero values)
2. for zero category of ispromo , it calculates medians for 5 last obs. by salescount
(without negative value and zero values)
3. than it subtracts median from mean and multiply result on  the count of non-zero and non-negative values for 1 category of ispromo

但有时中位数可以等于 =0 就像在这个例子中一样

mydat=structure(list(ItemRelation = c(11712L, 11712L, 11712L, 11712L, 
11712L, 11712L, 11712L, 11712L, 11712L, 11712L, 11712L, 11712L, 
11712L, 11712L, 11712L), SaleCount = c(0L, 0L, 0L, 0L, 0L, 0L, 
0L, 18L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), DocumentNum = c(197L, 197L, 
197L, 197L, 197L, 197L, 197L, 197L, 197L, 197L, 197L, 197L, 197L, 
197L, 197L), DocumentYear = c(2017L, 2017L, 2017L, 2017L, 2017L, 
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 
2017L), IsPromo = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L), CustomerType = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), CustomerName = c(2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L)), .Names = c("ItemRelation", 
"SaleCount", "DocumentNum", "DocumentYear", "IsPromo", "CustomerType", 
"CustomerName"), class = "data.frame", row.names = c(NA, -15L
))

在这种情况下,代码写入NA,然后它不会从平均值中提取中位数并且不会相乘。

简单的例子

ItemRelation    SaleCount   DocumentNum k   m0  r
11712             18    197           18    NA  NA

如何做到这一点,它考虑到零中位数并正确工作?

为 AAron 回答编辑

salescount 的平均值必须乘以 1 类 ispromo 的非零和非负值的计数。怎么做?

标签: rdplyrdata.tableplyrlapply

解决方案


你的逻辑有问题,而不是你的代码;你首先说你想要最后五个值的中位数没有负值和零值,然后说中位数应该为零。但是因为第一个,您已经删除了过滤器中的所有零值,所以所有值都为零(当 IsPromo=0 时)并且没有数据可以取中值。


推荐阅读