首页 > 解决方案 > `data.table::set()` 可以与键控 data.table 一起使用吗?

问题描述

我有一个特殊的用例,我需要经常在 R 中的键控 data.table 对象中设置单行的值。目前我正在使用该:=符号,但在帮助页面中阅读,在某些情况下set()可以甚至更快。

这对键控 data.tables 是真的吗?或者有没有办法set()与键控 data.tables 一起使用?我想我不确定引擎盖下发生了什么。

library(data.table)
#> Warning: package 'data.table' was built under R version 4.0.2
mt <- as.data.table(mtcars, keep.rownames = TRUE)
setkey(mt, rn)
head(mt)
#>                    rn  mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> 1:        AMC Javelin 15.2   8  304 150 3.15 3.435 17.30  0  0    3    2
#> 2: Cadillac Fleetwood 10.4   8  472 205 2.93 5.250 17.98  0  0    3    4
#> 3:         Camaro Z28 13.3   8  350 245 3.73 3.840 15.41  0  0    3    4
#> 4:  Chrysler Imperial 14.7   8  440 230 3.23 5.345 17.42  0  0    3    4
#> 5:         Datsun 710 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> 6:   Dodge Challenger 15.5   8  318 150 2.76 3.520 16.87  0  0    3    2

mt["AMC Javelin", mpg := -10] # want to do this, but faster?
head(mt)
#>                    rn   mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> 1:        AMC Javelin -10.0   8  304 150 3.15 3.435 17.30  0  0    3    2
#> 2: Cadillac Fleetwood  10.4   8  472 205 2.93 5.250 17.98  0  0    3    4
#> 3:         Camaro Z28  13.3   8  350 245 3.73 3.840 15.41  0  0    3    4
#> 4:  Chrysler Imperial  14.7   8  440 230 3.23 5.345 17.42  0  0    3    4
#> 5:         Datsun 710  22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> 6:   Dodge Challenger  15.5   8  318 150 2.76 3.520 16.87  0  0    3    2
set(mt, "AMC Javelin", 2L, -10) # this doesn't work
#> Error in set(mt, "AMC Javelin", 2L, -10): i is type 'character'. Must be integer, or numeric is coerced with warning. If i is a logical subset, simply wrap with which(), and take the which() outside the loop if possible for efficiency.
set(mt, 1L, 2L, -10) # this would work if I could get the row number of a given key...

reprex 包(v0.3.0)于 2021-08-06 创建


更新:Ronak Shah 和 sindri_baldur 的回答和评论非常适合我提出的问题(请参阅下面的基准测试)。不幸的是,我认为我的简单示例与我的实际用例不匹配。就我而言,有多个键控列,因此match不起作用chmatch。是否有适用于具有多个键列的 data.tables 的解决方案?

library(data.table)
#> Warning: package 'data.table' was built under R version 4.0.2
library(microbenchmark)

# Original question
mt <- as.data.table(mtcars, keep.rownames = TRUE)
setkey(mt, rn)
key <- "AMC Javenlin"

microbenchmark(
  mt[key, mpg := -10],
  set(mt, 1L, 2L, -10),
  set(mt, match(key, mt$rn), 2L, -10),
  set(mt, chmatch(key, mt$rn), 2L, -10)
)
#> Unit: microseconds
#>                                   expr     min       lq      mean   median
#>                mt[key, `:=`(mpg, -10)] 490.129 568.7480 746.67525 619.0085
#>                   set(mt, 1L, 2L, -10)   1.597   1.8980   4.17609   2.8475
#>    set(mt, match(key, mt$rn), 2L, -10)   3.104   3.7130   6.60660   4.9275
#>  set(mt, chmatch(key, mt$rn), 2L, -10)   2.740   3.3025   5.27118   4.3200
#>       uq      max neval cld
#>  701.094 8996.071   100   b
#>    4.298   87.451   100  a 
#>    7.726   45.807   100  a 
#>    7.002   11.811   100  a

我的情况更接近于此,那里有多个键...

dt <- CJ(a = 1:10, b = 1:10, c = 1:60)
setkey(dt)
dt$d <- NA
key <- list(a = 2, b = 7, c = 35)


microbenchmark(
  { dt[key, d := 1] },
  { set(dt, 1L, 4L, 1)}
)
#> Unit: microseconds
#>                         expr     min       lq      mean   median       uq
#>  {     dt[key, `:=`(d, 1)] } 634.125 666.5825 768.59937 756.9030 819.7585
#>   {     set(dt, 1L, 4L, 1) }   2.019   2.5355   3.95986   3.9325   4.6590
#>       max neval cld
#>  1171.794   100   b
#>    22.945   100  a

match(key, dt[, .(a, b, c)]) # doesn't work
#> [1] NA NA NA
chmatch(key, dt[, .(a, b, c)]) # doesn't work
#> Error in chmatch(key, dt[, .(a, b, c)]): table is type 'list' (must be 'character' or NULL)

reprex 包(v0.3.0)于 2021-08-06 创建

标签: rdata.table

解决方案


您可以使用match来获取键的行号。

library(data.table)

set(mt, match("AMC Javelin", mt$rn), 2L, -10)

head(mt)
#                   rn   mpg cyl disp  hp drat    wt  qsec vs am gear carb
#1:        AMC Javelin -10.0   8  304 150 3.15 3.435 17.30  0  0    3    2
#2: Cadillac Fleetwood  10.4   8  472 205 2.93 5.250 17.98  0  0    3    4
#3:         Camaro Z28  13.3   8  350 245 3.73 3.840 15.41  0  0    3    4
#4:  Chrysler Imperial  14.7   8  440 230 3.23 5.345 17.42  0  0    3    4
#5:         Datsun 710  22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#6:   Dodge Challenger  15.5   8  318 150 2.76 3.520 16.87  0  0    3    2

推荐阅读