r - 如何在 R 中创建具有匹配行和列的 data.frame 列表
问题描述
假设我有两个数据框df1
和df2
:
set.seed(123)
df1 <- data.frame(id=sample(letters[1:10], 10, replace = F),
x=rnorm(10), y=rnorm(10), z=rnorm(10), u=rnorm(10))
df1
id x y z u
1 f -1.02642090 0.91899661 -1.02412879 -0.07130809
2 i -0.71040656 -0.57534696 0.11764660 1.44455086
3 g 0.25688371 0.60796432 -0.94747461 0.45150405
4 b -0.24669188 -1.61788271 -0.49055744 0.04123292
5 e -0.34754260 -0.05556197 -0.25609219 -0.42249683
6 c -0.95161857 0.51940720 1.84386201 -2.05324722
7 d -0.04502772 0.30115336 -0.65194990 1.13133721
8 j -0.78490447 0.10567619 0.23538657 -1.46064007
9 a -1.66794194 -0.64070601 0.07796085 0.73994751
10 h -0.38022652 -0.84970435 -0.96185663 1.90910357
df2 <- data.frame(id=sample(letters[2:11], 10, replace = F),
x=rnorm(10), y=rnorm(10), z=rnorm(10), v=rnorm(10))
df2
id x y z v
1 j -1.27745077 -0.08868545 -0.56426954 1.84483867
2 e 1.17719205 -1.59548490 0.97031123 -0.98191715
3 c 0.90250583 0.85170932 -0.01863398 2.19600376
4 h -1.26130418 -0.71356081 0.36237035 -0.20466767
5 b 0.83745515 1.06643034 2.01130559 0.97514294
6 i -2.34829031 -0.53624259 -1.17796750 -0.86756612
7 k 0.61097114 0.53591706 -0.75517048 -0.50118759
8 g -0.04786774 -1.82862663 -0.33128448 0.78559116
9 f -2.39919771 -1.81353336 -0.28370270 -2.10224732
10 d -0.01931896 1.37261371 0.31415290 -0.04220493
我会创建一个列表或对象(首选),其中包含匹配的常见行(按 id)和列名,df1, df2 ...
例如
df_lst
df1
id x y z
1 b -0.4456620 -0.4727914 1.2538149
2 c -1.2650612 -1.9666172 0.1533731
3 d 0.4978505 0.8377870 0.5539177
4 e 1.7869131 -1.6866933 0.6886403
5 f 0.3598138 -0.2179749 -0.2950715
6 g -0.5558411 -0.6250393 0.8215811
7 h 1.2240818 -1.0678237 0.4264642
8 i 0.4007715 -1.0260044 0.8951257
9 j -0.6868529 0.7013559 -1.1381369
df2
id x y z
1 b -1.0700682 0.4120223 -0.279333528
2 c -0.2416898 -0.1524106 -0.778997240
3 d 1.6232025 0.6343621 -0.685706846
4 e 1.2283928 2.1499193 -0.735026156
5 f 0.2760235 -1.3343536 -1.427685784
6 g -1.0489755 0.4958705 0.619283535
7 h -0.5208693 1.2339762 -0.006198262
8 i -0.7729782 -0.9007918 -0.319393809
9 j -0.4682005 -0.2288958 -0.374800093
解决方案
我们可以用来从每个数据集中intersect
获取通用names
和“id”。然后是ing 列的subset
行%in%
select
intersect
nm1 <- intersect(names(df1), names(df2))
nm2 <- intersect(df1$id, df2$id)
df1new <- subset(df1, id %in% nm2, select =nm1)
df1new <- df1new[order(df1new$id),]
df2new <- subset(df2, id %in% nm2, select = nm1)
df2new <- df2new[order(df2new$id),]
如果有很多数据集,请将它们放在 a 中list
,用于Reduce
获取intersect
ing 列名称和 'id'
lst1 <- list(df1, df2)
nm1 <- Reduce(intersect, lapply(lst1, names))
nm2 <- Reduce(intersect, lapply(lst1, `[[`, "id"))
lst2 <- lapply(lst1, subset, subset = id %in% nm2, select = nm1)
如果需要order
编辑
lst2 <- lapply(lst1, function(x) {
x1 <- subset(x, id %in% nm2, select = nm1)
x1 <- x1[order(x1$id),]
row.names(x1) <- NULL
x1
})
-输出
lst2
[[1]]
id x y z
1 b -0.4456620 -0.4727914 1.2538149
2 c -1.2650612 -1.9666172 0.1533731
3 d 0.4978505 0.8377870 0.5539177
4 e 1.7869131 -1.6866933 0.6886403
5 f 0.3598138 -0.2179749 -0.2950715
6 g -0.5558411 -0.6250393 0.8215811
7 h 1.2240818 -1.0678237 0.4264642
8 i 0.4007715 -1.0260044 0.8951257
9 j -0.6868529 0.7013559 -1.1381369
[[2]]
id x y z
1 b -1.0700682 0.4120223 -0.279333528
2 c -0.2416898 -0.1524106 -0.778997240
3 d 1.6232025 0.6343621 -0.685706846
4 e 1.2283928 2.1499193 -0.735026156
5 f 0.2760235 -1.3343536 -1.427685784
6 g -1.0489755 0.4958705 0.619283535
7 h -0.5208693 1.2339762 -0.006198262
8 i -0.7729782 -0.9007918 -0.319393809
9 j -0.4682005 -0.2288958 -0.374800093
推荐阅读
- mongodb - 在 MongoDb 的 Document 中添加一个字段
- java - 在 Imageview 上动态设置 Handler Postdelayed
- r - 将类似于表的输出转换为字符串
- javascript - 关于 Jquery AutoComplete 的建议问题
- c# - 为什么不能从 C# 中的类型约束推断类型?
- java - 在Java中将两个整数转换为双精度数
- ios - 如何使用 objc_class 变量调用类方法
- sql - DISTINCT 不会删除 Google BigQuery 中的所有重复项
- c# - IIS 10.0,config source -1: 和 0: 在这里是什么意思?
- token - Azure Cosmos 主令牌过期