r - 根据多个条件将 NA 替换为中值
问题描述
这是我的第一篇 Stack Overflow 帖子。我进行了广泛的研究 ,但没有找到类似的帖子。
我试图根据两个条件估算 NA 值的中位数。
这是我的代码:
#Create sample of original data for reproducibility
Date<-c("2009-05-01","2009-05-02","2009-05-03","2009-06-01","2009-06-02",
"2009-06-03", "2010-05-01","2010-05-02","2010-05-03","2010-06-01",
"2010-06-02","2010-06-03","2011-05-01","2011-05-02","2011-05-03",
"2011-06-01","2011-06-02","2011-06-03")
Month<- c("May","May","May","June","June","June",
"May","May","May","June","June","June",
"May","May","May","June","June","June")
DayType<- c("Monday","Tuesday","Wednesday","Monday","Tuesday","Wednesday",
"Monday","Tuesday","Wednesday","Monday","Tuesday","Wednesday",
"Monday","Tuesday","Wednesday","Monday","Tuesday","Wednesday")
Qty<- c(NA,NA,NA,NA,NA,NA,
1,2,1,10,15,13,
3,2,5,20,14,16)
#Combine into dataframe
Example<-data.frame(Date,Month,DayType,Qty)
#Test output
Example
# Make a separate dataframe to calculate the median value based on day of the month
test1 <- ddply(Example,. (DayType,Month),summarize,median=median(Qty,na.rm=TRUE))
这按预期工作。Test1 输出如下所示:
DayType Month Median
Monday June 15.0
Monday May 2.0
Tuesday June 14.5
Tuesday May 2.0
Wednesday June 14.5
Wednesday May 3.0
我的第二步将原始数据集中的“NA”值替换为在 test1 中计算的中位数。这就是我的问题所在。
Example$Qty[is.na(Example$Qty)] <- test1$median[match(Example$DayType,test1$DayType,Example$Month,test1$Month)][is.na(Example$Qty)]
例子
Match[] 只匹配每一天的中值,而不是逐月匹配每一天的中值。整个集合的输出是相同的七个重复值。我还没有想出如何同时匹配两列。
Output:
Date DayType Month GSEvtQty
2009-05-01 Monday May 15.0 *should be 2.0, matching to June
2009-05-02 Tuesday May 14.5 *should be 2.0, matching to June
2009-05-03 Wednesday May 14.5 *should be 3.0, matching to June
2009-06-01 Monday June 15.0 *imputes correctly
2009-06-02 Tuesday June 14.5 *imputes correctly
2009-06-03 Wednesday June 14.5 *imputes correctly
2010-05-01 Monday May 1.0
2010-05-02 Tuesday May 2.0
2010-05-03 Wednesday May 1.0
2010-06-01 Monday June 10.0
2010-06-02 Tuesday June 15.0
2010-06-03 Wednesday June 13.0
我也尝试过使用 %in%:
Example$Qty[is.na(Example$Qty)] <- test1$median[Example$DayType %in% test1$DayType & Example$Month %in% test1$Month][is.na(Example$Qty)]
但这不正确匹配,并且仅输出有限数量的值,而不是整个 NA 系列。
正如@Jaap 巧妙地建议的那样,通过 Zoo 包使用 na.aggregate :
setDT(Example)[, Value := na.aggregate("Qty", FUN = median), by = c("DayType","Month")]
由于某种原因不会改变 NA:
Output:
Date Month DayType Qty
2009-05-01 May Monday NA
2009-05-02 May Tuesday NA
2009-05-03 May Wednesday NA
2009-06-01 June Monday NA
任何建议将不胜感激!感谢您这么长时间坚持这篇文章,并期待在未来支付援助。
解决方案
这就是merge
为此而创建的。
info$GSEvtQty[is.na(info$GSEvtQty)]<- merge(info[is.na(info$GSEvtQty,)], test1, by=c("DayType", "Month"))[,"GSEvtQty"]
推荐阅读
- fortran - 如何将两个足够大的整数(种类=4)加在一起存储为整数(种类=8)?
- python - Python3 用 zip 编译 App Decrease Size?
- django - 获取 |as_crispy_field 传递了一个无效或不存在的字段
- c - 是否可以在 C 中制作紧凑的 Unicode 兼容 strtoupper()/strtolower() 函数?
- mysql - 无法在 ubuntu 上启动 mysql 服务器
- php - 添加数组元素时出现意外的“=>”
- java - 纯java方法应该是静态的吗?
- appium - 如何减少 Appium 中的测试执行时间
- git - 有没有办法在 bitbucket 中找到已删除文件夹的历史记录,如果它在合并中被删除?
- bots - Microsoft.IdentityModel.Clients.ActiveDirectory.ClientCredential 中的异常