首页 > 解决方案 > 如果向量中有值,则 R 中的新列

问题描述

我的 R 生锈了,我正在努力寻找这个相当简单问题的答案。我希望根据列中的日期条目date是否存在于另一个向量中来创建一个新列。

为了说明这个问题,我按如下方式计算行数(这种方法有效):

sum(as.numeric(block$date == "2019-10-11  06:30:00"))

它正确地给了我1

但是我应该这样做:

sum(as.numeric(block$date %in% c("2019-10-11  06:30:00")))

我得到0了,这是一个问题,因为我需要检查多个日期时间值。

数据框示例如下:

                  date Efficiency    PowAC    PowDC  TempCPU TempIGBT failures
1: 2019-10-11 06:30:00   97.77433 488.0686 593.1467 32.04367 49.16300        0
2: 2019-03-18 15:25:00   97.79300 485.2857 590.2600 32.29633 50.02533        0
3: 2019-03-18 15:30:00   97.78000 484.7714 589.6767 32.02700 49.22233        0
4: 2019-03-18 15:35:00   97.78233 482.2714 586.6633 32.26733 49.56700        0
5: 2019-03-18 15:40:00   97.75700 480.3343 585.2167 32.02000 49.18667        0
6: 2019-03-18 15:45:00   97.80400 477.5114 580.5467 32.21833 49.30067        0
7: 2019-03-18 15:50:00   97.79633 474.8886 578.0433 32.02833 48.86067        0
8: 2019-03-18 15:55:00   97.79400 477.2629 581.0667 32.29933 49.45333        0

如下dput(block, head(10)

library(data.table)
setDT(structure(list(date = structure(c(1546300800, 1546301100, 1546301400, 
1546301700, 1546302000, 1546302300, 1546302600, 1546302900, 1546303200, 
1546303500), class = c("POSIXct", "POSIXt")), Efficiency = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0), PowAC = c(NaN, NaN, NaN, NaN, NaN, 
NaN, NaN, NaN, NaN, NaN), PowDC = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 
0), TempCPU = c(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 
NaN), TempIGBT = c(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 
NaN), failures = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, 
-10L), class = c("data.table", "data.frame"), sorted = "date"))

我正在测试的向量如下:

dput(failures)
c("2019-10-11 06:30:00", "2019-10-12 06:30:00", "2019-10-12 17:45:00", 
"2019-10-13 06:30:00")

标签: rdataframe

解决方案


你的classes 必须匹配。

我将首先分配一个稍作修改failures以包含相关日期

failures <- c("2018-12-31 19:30:00", "2019-10-12 06:30:00", "2019-10-12 17:45:00", "2019-10-13 06:30:00")

(虽然它仍然是character),并blockstructure(.)输出中使用你的。

block$date %in% failures
#  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
DT$date %in% as.POSIXct(failures)
#  [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE

严格平等和集合成员资格可能存在其他问题:

  • 时区:即使显示的时间相同,如果时区不同则时间不同。一个可以有两次不同的区域是相等的(尽管它在控制台上的表示看起来不同),但我不认为这就是你在这里所拥有的。

  • POSIXt并且Date实际上numeric在下面,这意味着它们是浮点数。当时间和/或日期几乎是整数时,R 往往“足够好”来确定相等,但即使是浮点相等也可能是一个问题,而且很难找到,因为它有时有效,有时无效。当我看到这是罪魁祸首时,我添加到答案中的一个常见评论是:

    Computers have limitations when it comes to floating-point numbers (aka double, numeric, float). This is a fundamental limitation of computers in general, in how they deal with non-integer numbers. This is not specific to any one programming language. There are some add-on libraries or packages that are much better at arbitrary-precision math, but I believe most main-stream languages (this is relative/subjective, I admit) do not use these by default. Refs: Why are these numbers not equal?, Is floating point math broken?, and https://en.wikipedia.org/wiki/IEEE_754

    While it does not appear to be the case here, it could be. Keep it in mind :-)


推荐阅读