r - data.table 连接中的列名标签
问题描述
我正在尝试使用非 equi 连接将 data.table x 连接到 z。表 x 包含两列 X1 和 X2,它们用作与 z 中的列 Z1 连接的范围。当前代码成功地进行了非 equi 连接,但是某些列被删除或重命名。我想返回提供的“理想”data.table,而不是我目前拥有的那个,我必须重命名列或进一步连接数据以获得提供的“理想”数据。
> library(data.table)
>
> x <- data.table(Id = c("A", "B", "C", "C"),
+ X1 = c(1L, 3L, 5L, 7L),
+ X2 = c(8L,12L,9L,18L),
+ XY = c("x2", "x4", "x6", "x8"))
>
> z <- data.table(ID = "C", Z1 = 5:9, Z2 = paste0("z", 5:9))
>
> x
Id X1 X2 XY
1: A 1 8 x2
2: B 3 12 x4
3: C 5 9 x6
4: C 7 18 x8
> z
ID Z1 Z2
1: C 5 z5
2: C 6 z6
3: C 7 z7
4: C 8 z8
5: C 9 z9
>
> # suboptimal data return data format
> x[z, on = .(Id == ID, X1 <= Z1, X2 >= Z1)]
Id X1 X2 XY Z2
1: C 5 5 x6 z5
2: C 6 6 x6 z6
3: C 7 7 x6 z7
4: C 7 7 x8 z7
5: C 8 8 x6 z8
6: C 8 8 x8 z8
7: C 9 9 x6 z9
8: C 9 9 x8 z9
>
> # column names are Id, X1 and X2 from x which replaces ID and Z1. The contents of X1 and X2 are also changed to the original values of Z1.
> # XY and Z2 remain unchanged.
>
> # I want to create the following table where the original column names and values are retained, while still joining the table in a non-equi way.
>
> ideal <- data.table(ID = c("C", "C", "C", "C", "C", "C", "C", "C"),
+ Z1 = c(5, 6, 7, 7, 8, 8, 9, 9),
+ Z2 = c("Z5", "z6", "z7", "z7", "z8", "z8", "z9", "z9"),
+ X1 = c(5, 5, 5, 7, 5, 7, 5, 7),
+ X2 = c(9, 9, 9, 18, 9, 18, 9, 18),
+ XY = c("x6", "x6", "x6", "x8", "x6", "x8", "x6", "x8"))
>
> print(ideal)
ID Z1 Z2 X1 X2 XY
1: C 5 Z5 5 9 x6
2: C 6 z6 5 9 x6
3: C 7 z7 5 9 x6
4: C 7 z7 7 18 x8
5: C 8 z8 5 9 x6
6: C 8 z8 7 18 x8
7: C 9 z9 5 9 x6
8: C 9 z9 7 18 x8
解决方案
正如@Humpelstielzchen 所评论的,可以通过手动选择所需的列来完成。但是必须使用前缀x.
(x
inx.
指的是参数x
而[.data.table
不是 data.table 的名称)来恢复原始 data.table 中的列x
。否则会产生不正确的输出。
# desired
x[z, .(ID, Z1, Z2, X1 = x.X1, X2 = x.X2, XY), on = .(Id == ID, X1 <= Z1, X2 >= Z1)]
# ID Z1 Z2 X1 X2 XY
# 1: C 5 z5 5 9 x6
# 2: C 6 z6 5 9 x6
# 3: C 7 z7 5 9 x6
# 4: C 7 z7 7 18 x8
# 5: C 8 z8 5 9 x6
# 6: C 8 z8 7 18 x8
# 7: C 9 z9 5 9 x6
# 8: C 9 z9 7 18 x8
# undesired
x[z, on = .(Id == ID, X1 <= Z1, X2 >= Z1), .(ID, Z1, Z2, X1, X2, XY)]
# ID Z1 Z2 X1 X2 XY
# 1: C 5 z5 5 5 x6
# 2: C 6 z6 6 6 x6
# 3: C 7 z7 7 7 x6
# 4: C 7 z7 7 7 x8
# 5: C 8 z8 8 8 x6
# 6: C 8 z8 8 8 x8
# 7: C 9 z9 9 9 x6
# 8: C 9 z9 9 9 x8
packageVersion('data.table')
# '1.13.2'
如果有许多其他列,这使得手动选择不可行,这里有一个解决方法,方法是在加入过程中x
重新放置位置:z
DT <- z[x, on = .(ID=Id, Z1 >= X1, Z1 <= X2), nomatch = NULL]
#' since for non-equi conditions, the values are from RHS while
#' the column names were from LHS, we known that `Z1` and `Z1.1`
#' correspond to `X1` and `X2`.
setnames(DT, c('Z1', 'Z1.1'), c('X1', 'X2'))
DT[z, Z1 := i.Z1, on = .(ID, Z2)]
# > DT
# ID X1 Z2 X2 XY Z1
# 1: C 5 z5 9 x6 5
# 2: C 5 z6 9 x6 6
# 3: C 5 z7 9 x6 7
# 4: C 5 z8 9 x6 8
# 5: C 5 z9 9 x6 9
# 6: C 7 z7 18 x8 7
# 7: C 7 z8 18 x8 8
# 8: C 7 z9 18 x8 9
推荐阅读
- sql - 每月需要将单行转换为多行
- symfony - 交响乐。如何构建表格?
- laravel - 如何从 Laravel 中的 Storage/App 目录访问文件
- swift - 如何从谷歌自动完成中获取所选地点的正确纬度
- python - 用于从 systemd 日志返回 systemd 单元日志消息列表的 Python 脚本
- c++ - 如何在 Windows Cmake 中将 Dlib 设置为静态安装的共享库
- docker - Vagrant 和 Docker 之间的 Redis 连接被拒绝
- javascript - Chrome 扩展程序“接收端不存在。” 错误
- python - 在python中为可滚动图像添加复选框
- ruby - 用 Ruby 打印矩形