首页 > 解决方案 > 如何在 R 中创建具有匹配行和列的 data.frame 列表

问题描述

假设我有两个数据框df1df2

    set.seed(123)
    df1 <- data.frame(id=sample(letters[1:10], 10, replace = F), 
                      x=rnorm(10), y=rnorm(10), z=rnorm(10), u=rnorm(10))
    df1
   id           x           y           z           u
1   f -1.02642090  0.91899661 -1.02412879 -0.07130809
2   i -0.71040656 -0.57534696  0.11764660  1.44455086
3   g  0.25688371  0.60796432 -0.94747461  0.45150405
4   b -0.24669188 -1.61788271 -0.49055744  0.04123292
5   e -0.34754260 -0.05556197 -0.25609219 -0.42249683
6   c -0.95161857  0.51940720  1.84386201 -2.05324722
7   d -0.04502772  0.30115336 -0.65194990  1.13133721
8   j -0.78490447  0.10567619  0.23538657 -1.46064007
9   a -1.66794194 -0.64070601  0.07796085  0.73994751
10  h -0.38022652 -0.84970435 -0.96185663  1.90910357
    df2 <- data.frame(id=sample(letters[2:11], 10, replace = F), 
                      x=rnorm(10), y=rnorm(10), z=rnorm(10), v=rnorm(10))
    df2
   id           x           y           z           v
1   j -1.27745077 -0.08868545 -0.56426954  1.84483867
2   e  1.17719205 -1.59548490  0.97031123 -0.98191715
3   c  0.90250583  0.85170932 -0.01863398  2.19600376
4   h -1.26130418 -0.71356081  0.36237035 -0.20466767
5   b  0.83745515  1.06643034  2.01130559  0.97514294
6   i -2.34829031 -0.53624259 -1.17796750 -0.86756612
7   k  0.61097114  0.53591706 -0.75517048 -0.50118759
8   g -0.04786774 -1.82862663 -0.33128448  0.78559116
9   f -2.39919771 -1.81353336 -0.28370270 -2.10224732
10  d -0.01931896  1.37261371  0.31415290 -0.04220493

我会创建一个列表或对象(首选),其中包含匹配的常见行(按 id)和列名,df1, df2 ...例如

df_lst
df1
  id          x          y          z
1  b -0.4456620 -0.4727914  1.2538149
2  c -1.2650612 -1.9666172  0.1533731
3  d  0.4978505  0.8377870  0.5539177
4  e  1.7869131 -1.6866933  0.6886403
5  f  0.3598138 -0.2179749 -0.2950715
6  g -0.5558411 -0.6250393  0.8215811
7  h  1.2240818 -1.0678237  0.4264642
8  i  0.4007715 -1.0260044  0.8951257
9  j -0.6868529  0.7013559 -1.1381369

df2
  id          x          y            z
1  b -1.0700682  0.4120223 -0.279333528
2  c -0.2416898 -0.1524106 -0.778997240
3  d  1.6232025  0.6343621 -0.685706846
4  e  1.2283928  2.1499193 -0.735026156
5  f  0.2760235 -1.3343536 -1.427685784
6  g -1.0489755  0.4958705  0.619283535
7  h -0.5208693  1.2339762 -0.006198262
8  i -0.7729782 -0.9007918 -0.319393809
9  j -0.4682005 -0.2288958 -0.374800093

标签: r

解决方案


我们可以用来从每个数据集中intersect获取通用names和“id”。然后是ing 列的subset%in%selectintersect

nm1 <- intersect(names(df1), names(df2))
nm2 <- intersect(df1$id, df2$id)
df1new <- subset(df1, id %in% nm2, select =nm1)
df1new <- df1new[order(df1new$id),]
df2new <- subset(df2, id %in% nm2, select = nm1)
df2new <- df2new[order(df2new$id),]

如果有很多数据集,请将它们放在 a 中list,用于Reduce获取intersecting 列名称和 'id'

lst1 <- list(df1, df2)
nm1 <- Reduce(intersect, lapply(lst1, names))
nm2 <- Reduce(intersect, lapply(lst1, `[[`, "id"))

lst2 <- lapply(lst1, subset, subset = id %in% nm2, select = nm1)

如果需要order编辑

lst2 <- lapply(lst1, function(x) {
            x1 <- subset(x, id %in% nm2, select = nm1)
            x1 <- x1[order(x1$id),]
            row.names(x1) <- NULL
            x1
         })

-输出

lst2
[[1]]
  id          x          y          z
1  b -0.4456620 -0.4727914  1.2538149
2  c -1.2650612 -1.9666172  0.1533731
3  d  0.4978505  0.8377870  0.5539177
4  e  1.7869131 -1.6866933  0.6886403
5  f  0.3598138 -0.2179749 -0.2950715
6  g -0.5558411 -0.6250393  0.8215811
7  h  1.2240818 -1.0678237  0.4264642
8  i  0.4007715 -1.0260044  0.8951257
9  j -0.6868529  0.7013559 -1.1381369

[[2]]
  id          x          y            z
1  b -1.0700682  0.4120223 -0.279333528
2  c -0.2416898 -0.1524106 -0.778997240
3  d  1.6232025  0.6343621 -0.685706846
4  e  1.2283928  2.1499193 -0.735026156
5  f  0.2760235 -1.3343536 -1.427685784
6  g -1.0489755  0.4958705  0.619283535
7  h -0.5208693  1.2339762 -0.006198262
8  i -0.7729782 -0.9007918 -0.319393809
9  j -0.4682005 -0.2288958 -0.374800093

推荐阅读