首页 > 解决方案 > R错误:将'list' RHS强制为'double'以匹配目标列的类型

问题描述

我有一个数据集 DT 如下:

    index_date  date_1      date_2      res_1   res_2   taken_date   taken_res
1   2015-08-25  2013-11-13  2015-08-25  1.50    1.5     NA           NA
2   2017-09-11  2016-09-29  2017-05-12  2.70    2.4     NA           NA
3   2015-08-17  2014-08-08  2015-06-08  2.00    2.6     NA           NA
4   2017-05-14  2016-05-31  2016-12-19  1.30    1.2     NA           NA
5   2015-11-14  2014-11-11  2015-08-10  1.60    2.8     NA           NA
6   2016-08-08  NA          2016-08-08  NA      1.4     NA           NA
7   2018-12-01  2014-05-30  2017-07-24  1.70    1.8     NA           NA
8   2013-01-11  NA          2012-10-23  NA      3.7     NA           NA
9   2015-06-06  NA          2015-02-07  NA      1.3     NA           NA
10  2015-05-19  NA          2015-05-19  NA      1.4     NA           NA

我想要的是:

我有一个工作函数,如果有两个日期和结果,它会找到最接近的日期并相应地填充值。

但是,当只有一个日期和结果时,我的问题就出现了,例如 6 日、8 日、9 日和 10 日。

代码是:

date.vars <- c("date_1", "date_2")
res.vars <- c("res_1", "res_2")
taken.vars <- c("taken_date", "taken_res")

# some more lines here to prepare DT
...

# only one date and result
DT[apply(DT[, date.vars, with=F], 1, function(x)
  sum(is.na(x))==1), 

  (taken.vars) := list(
    apply(.SD, 1, function(x)
      as.numeric(na.omit(x[res.vars]))),

    apply(.SD, 1, function(x)
      as.Date(na.omit(x[date.vars])))
  )]

R 向我返回如下警告:

Error in `[.data.table`(DT, apply(DT[, date.vars, with = F], 1, function(x) sum(is.na(x)) ==  : 
  (list) object cannot be coerced to type 'double'
In addition: Warning message:
In `[.data.table`(DT, apply(DT[, date.vars, with = F], 1, function(x) sum(is.na(x)) ==  :
  Coercing 'list' RHS to 'double' to match the type of the target column (column 7 named 'taken_res').

你能帮我更正我的代码吗?

标签: rdata.table

解决方案


这是我的尝试。我没有处理日期不适用的情况。您可以做的是计算 1) 索引日期和日期 1 以及 2) 索引日期和日期 2 之间的差距。使用这些差距,您可以运行逻辑检查。基于此,您可以分配目标日期和值。

我不是可以很好地解释错误信息的合适人选。但是,我认为你正面临强制问题。请参阅 CRAN 手册(版本 1.12.8)中的第 16 页,您可以在其中找到有关:=(通过引用分配)的信息。如果有人可以提供技术解释,请提供。

setDT(mydt)[, `:=` (taken_date = fifelse(test = abs(index_date - date_1) < abs(index_date - date_2),
                                         yes = date_1,
                                         no = date_2),
                    taken_res = fifelse(test = abs(index_date - date_1) < abs(index_date - date_2),
                                        yes = res_1,
                                        no = res_2))][]

#    index_date     date_1     date_2 res_1 res_2 taken_date taken_res
# 1: 2015-08-25 2013-11-13 2015-08-25   1.5   1.5 2015-08-25       1.5
# 2: 2017-09-11 2016-09-29 2017-05-12   2.7   2.4 2017-05-12       2.4
# 3: 2015-08-17 2014-08-08 2015-06-08   2.0   2.6 2015-06-08       2.6
# 4: 2017-05-14 2016-05-31 2016-12-19   1.3   1.2 2016-12-19       1.2
# 5: 2015-11-14 2014-11-11 2015-08-10   1.6   2.8 2015-08-10       2.8
# 6: 2016-08-08       <NA> 2016-08-08    NA   1.4       <NA>        NA
# 7: 2018-12-01 2014-05-30 2017-07-24   1.7   1.8 2017-07-24       1.8
# 8: 2013-01-11       <NA> 2012-10-23    NA   3.7       <NA>        NA
# 9: 2015-06-06       <NA> 2015-02-07    NA   1.3       <NA>        NA
#10: 2015-05-19       <NA> 2015-05-19    NA   1.4       <NA>        NA

尽管您没有明确提到您想对带有 NA 的行做什么,但在我看来,您正在尝试做这样的事情。

setDT(mydt)[, `:=` (taken_date = fifelse(test = abs(index_date - date_1) < abs(index_date - date_2),
                                         yes = date_1,
                                         no = date_2),
                    taken_res = fifelse(test = abs(index_date - date_1) < abs(index_date - date_2),
                                        yes = res_1,
                                        no = res_2))][is.na(date_1),
            `:=` (taken_date = date_2, taken_res = res_2)][is.na(date_2),
            `:=` (taken_date = date_1, taken_res = res_1)]

#    index_date     date_1     date_2 res_1 res_2 taken_date taken_res
# 1: 2015-08-25 2013-11-13 2015-08-25   1.5   1.5 2015-08-25       1.5
# 2: 2017-09-11 2016-09-29 2017-05-12   2.7   2.4 2017-05-12       2.4
# 3: 2015-08-17 2014-08-08 2015-06-08   2.0   2.6 2015-06-08       2.6
# 4: 2017-05-14 2016-05-31 2016-12-19   1.3   1.2 2016-12-19       1.2
# 5: 2015-11-14 2014-11-11 2015-08-10   1.6   2.8 2015-08-10       2.8
# 6: 2016-08-08       <NA> 2016-08-08    NA   1.4 2016-08-08       1.4
# 7: 2018-12-01 2014-05-30 2017-07-24   1.7   1.8 2017-07-24       1.8
# 8: 2013-01-11       <NA> 2012-10-23    NA   3.7 2012-10-23       3.7
# 9: 2015-06-06       <NA> 2015-02-07    NA   1.3 2015-02-07       1.3
#10: 2015-05-19       <NA> 2015-05-19    NA   1.4 2015-05-19       1.4

数据

mydt <- structure(list(index_date = structure(c(16672, 17420, 16664, 
17300, 16753, 17021, 17866, 15716, 16592, 16574), class = "Date"), 
date_1 = structure(c(16022, 17073, 16290, 16952, 16385, NA, 
16220, NA, NA, NA), class = "Date"), date_2 = structure(c(16672, 
17298, 16594, 17154, 16657, 17021, 17371, 15636, 16473, 16574
), class = "Date"), res_1 = c(1.5, 2.7, 2, 1.3, 1.6, NA, 
1.7, NA, NA, NA), res_2 = c(1.5, 2.4, 2.6, 1.2, 2.8, 1.4, 
1.8, 3.7, 1.3, 1.4)), row.names = c("1", "2", "3", "4", "5", 
"6", "7", "8", "9", "10"), class = "data.frame")

推荐阅读