首页 > 解决方案 > 基于多个标准的data.table R中的新变量

问题描述

我有一个包含从 1930 年到 2006 年所有足球世界杯球队的数据集,其中包含几个变量,如下所示:

    Year       Team ISO GF GA Penalties Matches SOG SW FK Offside Corners Won Drawn Lost
1: 1930  Argentina  AR 18  9         0       5  18  0  0       0       0   4     0    1
2: 1930    Uruguay  UY 15  3         0       4  15  0  0       0       0   4     0    0
3: 1930        USA  US  7  6         0       3   7  0  0       0       0   2     0    1
4: 1930 Yugoslavia  YU  7  7         0       3   7  0  0       0       0   2     0    1
5: 1930     Brazil  BR  5  2         0       2   5  0  0       0       0   1     0    1
6: 1930      Chile  CL  5  3         0       3   5  0  0       0       0   2     0    1

我想指出哪支球队是给定年份的获胜者,即创建一个新变量并将 1 分配给获胜者,将 0 分配给其他球队,如下所示:

    Year       Team ISO GF GA Penalties Matches SOG SW FK Offside Corners Won Drawn Lost Winner
1: 1930  Argentina  AR 18  9         0       5  18  0  0       0       0   4     0    1      0
2: 1930    Uruguay  UY 15  3         0       4  15  0  0       0       0   4     0    0      1
3: 1930        USA  US  7  6         0       3   7  0  0       0       0   2     0    1      0
4: 1930 Yugoslavia  YU  7  7         0       3   7  0  0       0       0   2     0    1      0
5: 1930     Brazil  BR  5  2         0       2   5  0  0       0       0   1     0    1      0
6: 1930      Chile  CL  5  3         0       3   5  0  0       0       0   2     0    1      0

我有一个按顺序排列获胜者的向量,我想这样做:

data[, Winner := ifelse(Team == c("Uruguay", "Italy", "Italy", "Uruguay", "Germany FR", "Brazil", "Brazil", "England", "Brazil", "Germany FR", "Argentina", "Italy", "Argentina", "Germany FR", "Brazil", "France", "Brazil", "Italy"), 1, 0), by = Year]

所以在 1930 年给乌拉圭分配 1,给其他球队分配 0,在 1934 年给意大利分配 1,给其他球队分配 0,依此类推……但它当然行不通,你能帮我实现我的目标吗?

标签: rif-statementdata.table

解决方案


在这里,问题似乎是

  1. 我们正在使用==另一个不同长度的向量
  2. 根据描述,我们可能需要匹配与“年份”列对应的获胜者向量。

如果是这种情况,我们可以使用每个“年份”的replicatevector获胜者名称(“team_vec”),length然后执行==

library(data.table)
team_vec <-  c("Uruguay", "Italy", "Italy", "Uruguay",
  "Germany FR", "Brazil", "Brazil", "England", "Brazil", 
  "Germany FR", "Argentina", "Italy", "Argentina", "Germany FR", "Brazil", 
             "France", "Brazil", "Italy")


n1 <- data[, .(n = .N), Year]$n
data[, Winner := +(Team %in% rep(team_vec, n1))]

或者另一种选择是将join'key'设置为'Year'。为此,我们可能需要创建一个keyvalue数据集。如果需要,更改“年份”的值

keydat <- data.table(Year = seq(1930, 
     length.out = length(team_vec), by = 1), Team = team_vec)

然后加入

 data[, Winner := 0]
 data[keydat, Winner := 1, on = .(Team, Year)]
      

推荐阅读