首页 > 解决方案 > 使用 R 中的其他行创建平均变量

问题描述

我有一个数据集,其中每一行代表一个特定年份的国家/地区。Neighbor1我在每一行 ( & ) 中还有两个变量Neighbor2,列出了该国家地理邻国的国家代码。

示例数据集代码:

A=c("US", "Cuba", "France", "Germany", "Belgium", "US", "Cuba", "France", "Germany", "Belgium")  
B=c(2000, 2000, 2000, 2000, 2000, 2001, 2001, 2001, 2001, 2001)
C=c(2, 40, 220, 255, 211, 2, 40, 220, 255, 211)
D=c(5, 10, 11, 3, 8, 1, 7, 15, 6, 9)
E=c(40, 2, 211, 211, 220, 40, 2, 211, 211, 220)
G=c(NA, NA, 255, 220, 255, NA, NA, 255, 220, 255)
Example <- data.frame(A, B, C, D, E, G)
colnames(Example) <- c("Country", "Year", "CountryCode", "TerrorismDeaths", "Neighbor1", "Neighbor2")

数据集:

   Country Year CountryCode TerrorismDeaths Neighbor1 Neighbor2
1       US 2000           2               5        40        NA
2     Cuba 2000          40              10         2        NA
3   France 2000         220              11       211       255
4  Germany 2000         255               3       211       220
5  Belgium 2000         211               8       220       255
6       US 2001           2               1        40        NA
7     Cuba 2001          40               7         2        NA
8   France 2001         220              15       211       255
9  Germany 2001         255               6       211       220
10 Belgium 2001         211               9       220       255 

我想要做的是创建一个变量来衡量每个国家邻国在该特定年份的平均恐怖主义死亡人数。所以,我想添加一个如下所示的行:

期望的输出:

   Country Year CountryCode TerrorismDeaths Neighbor1 Neighbor2 NeighborAvgTerror
1       US 2000           2               5        40        NA              10.0
2     Cuba 2000          40              10         2        NA               5.0
3   France 2000         220              11       211       255               5.5
4  Germany 2000         255               3       211       220               9.5
5  Belgium 2000         211               8       220       255               7.0
6       US 2001           2               1        40        NA               7.0
7     Cuba 2001          40               7         2        NA               1.0
8   France 2001         220              15       211       255               7.5
9  Germany 2001         255               6       211       220              12.0
10 Belgium 2001         211               9       220       255              10.5 

标签: raveragespatial

解决方案


这可以通过三个步骤完成

  1. 将数据集从宽格式重塑为长格式
  2. 在自联接中聚合
  3. 使用更新连接将结果列附加到原始数据集

data.table

library(data.table)
long <- melt(setDT(Example), measure.vars = patterns("^Neighbor"), 
             value.name = "Neighbor", na.rm = TRUE)
agg <- long[long, on = .(Year, Neighbor = CountryCode), 
            mean(TerrorismDeaths), by = .EACHI]
Example[agg, on = .(Year, CountryCode = Neighbor), NeighborAvgTerror := V1]

Example[]
    Country Year CountryCode TerrorismDeaths Neighbor1 Neighbor2 NeighborAvgTerror
 1:      US 2000           2               5        40        NA              10.0
 2:    Cuba 2000          40              10         2        NA               5.0
 3:  France 2000         220              11       211       255               5.5
 4: Germany 2000         255               3       211       220               9.5
 5: Belgium 2000         211               8       220       255               7.0
 6:      US 2001           2               1        40        NA               7.0
 7:    Cuba 2001          40               7         2        NA               1.0
 8:  France 2001         220              15       211       255               7.5
 9: Germany 2001         255               6       211       220              12.0
10: Belgium 2001         211               9       220       255              10.5

推荐阅读