首页 > 解决方案 > 将两个不同列表的数据帧融合为 r 中的一个数据帧列表

问题描述

我有两个数据框列表:list1 和 list2。下面是来自 list1 (df1) 和 list2 (df2) 的示例数据框:

> print(df1)

         Moment.ext_multi.lane   Moment.ext_single.lane   Moment.int_multi.lane  Moment.int_single.lane
Baseline   0.7109148                  0.5367121               0.5874249               0.3718993
Sample1    0.7109148                  0.5367121               0.5874249               0.3718993
Sample2    0.7109148                  0.5367121               0.5874249               0.3718993
Sample3    0.7109148                  0.5367121               0.5874249               0.3718993
Sample4    0.7109148                  0.5367121               0.5874249               0.3718993
Sample5    0.7109148                  0.5367121               0.5874249               0.3718993
Sample6    0.7109148                  0.5367121               0.5874249               0.3718993
Sample7    0.7109148                  0.5367121               0.5874249               0.3718993
Sample8    0.7109148                  0.5367121               0.5874249               0.3718993
Sample9    0.7109148                  0.5367121               0.5874249               0.3718993
Sample10   0.7109148                  0.5367121               0.5874249               0.3718993
AASHTO     0.7550000                  NA                      0.6640000               0.4310000
Mean       0.7109148                  0.5367121               0.5874249               0.3718993

> print(df2)

         Shear.ext_multi.lane   Shear.ext_single.lane   Shear.int_multi.lane  Shear.int_single.lane
Baseline   0.7109148                  0.5367121               0.5874249               0.3718993
Sample1    0.7109148                  0.5367121               0.5874249               0.3718993
Sample2    0.7109148                  0.5367121               0.5874249               0.3718993
Sample3    0.7109148                  0.5367121               0.5874249               0.3718993
Sample4    0.7109148                  0.5367121               0.5874249               0.3718993
Sample5    0.7109148                  0.5367121               0.5874249               0.3718993
Sample6    0.7109148                  0.5367121               0.5874249               0.3718993
Sample7    0.7109148                  0.5367121               0.5874249               0.3718993
Sample8    0.7109148                  0.5367121               0.5874249               0.3718993
Sample9    0.7109148                  0.5367121               0.5874249               0.3718993
Sample10   0.7109148                  0.5367121               0.5874249               0.3718993
AASHTO     0.7550000                  NA                      0.6640000               0.4310000
Mean       0.7109148                  0.5367121               0.5874249               0.3718993

我想将这两个列表合并到一个新的数据框列表中,并删除所有 rown 是所有具有称为“平均值”的行名的数据框:list3。

然后我想融合列表的数据,使新列表中的数据框有 4 列。

第一列是 Source,如果原始列表 list1 和列表 2 的行名是“Sample1”到“Sample10”,则 Source 表示 Samples,如果行名是“baseline”,则 Source 表示 Baseline,如果行名是“AASHTO”然后 Source 也表示 AASHTO。

第二列是类型,提取列名称的结尾(从开头删除“Moment.”和“Shear.”,从末尾删除“.lane”)。

第三列是 Moment,包括 list1 的值。

第四列是 Shear,包括 list1 的值。

最终列表 list3 中的预期样本数据帧 (df3) 是:

> print(df2)
     Source        Type           Shear          Moment
1   Baseline     ext_multi      0.5367121      0.5874249
2   Baseline     ext_single     0.5367121      0.5874249    
3   Baseline     int_multi      0.5367121      0.5874249
4   Baseline     int_single     0.5367121      0.5874249
5   AASHTO       ext_multi      0.5367121      0.5874249
6   AASHTO       ext_single     0.5367121      0.5874249    
7   AASHTO       int_multi      0.5367121      0.5874249
8   AASHTO       int_single     0.5367121      0.5874249
9   AASHTO       int_single     0.5367121      0.5874249
5   Sample       ext_multi      0.5367121      0.5874249
6   Sample       ext_single     0.5367121      0.5874249    
7   Sample       int_multi      0.5367121      0.5874249
8   Sample       int_single     0.5367121      0.5874249
9   Sample       int_single     0.5367121      0.5874249
... continues 

标签: rdataframe

解决方案


我们可以pivot_longer在两个元素中使用 reshape 为“long”格式list,然后使用map2循环两个 s 的相应元素list并进行连接

lst1new <-  map(lst1, ~
              .x %>% 
                 rownames_to_column("Source") %>% 
                 pivot_longer(cols = -Source, names_to = 'Type', 
                   values_to = 'Moment') %>% 
                 mutate(Type = str_replace(Type, '^\\w+\\.([^.]+)\\..*', '\\1')))

lst2new <-  map(lst2, ~
       .x %>% 
          rownames_to_column("Source") %>% 
          pivot_longer(cols = -Source, names_to = 'Type',
                values_to = 'Shear') %>%
          mutate(Type = str_replace(Type, '^\\w+\\.([^.]+)\\..*', '\\1')))

map2(lst1new, lst2new, full_join)
#[[1]]
# A tibble: 52 x 4
#   Source   Type       Moment Shear
# * <chr>    <chr>       <dbl> <dbl>
# 1 Baseline ext_multi   0.711 0.711
# 2 Baseline ext_single  0.537 0.537
# 3 Baseline int_multi   0.587 0.587
# 4 Baseline int_single  0.372 0.372
# 5 Sample1  ext_multi   0.711 0.711
# 6 Sample1  ext_single  0.537 0.537
# 7 Sample1  int_multi   0.587 0.587
# 8 Sample1  int_single  0.372 0.372
# 9 Sample2  ext_multi   0.711 0.711
#10 Sample2  ext_single  0.537 0.537
# … with 42 more rows

#[[2]]
# A tibble: 52 x 4
#   Source   Type       Moment Shear
# * <chr>    <chr>       <dbl> <dbl>
# 1 Baseline ext_multi   0.711 0.711
# 2 Baseline ext_single  0.537 0.537
# 3 Baseline int_multi   0.587 0.587
# 4 Baseline int_single  0.372 0.372
# 5 Sample1  ext_multi   0.711 0.711
# 6 Sample1  ext_single  0.537 0.537
# 7 Sample1  int_multi   0.587 0.587
# 8 Sample1  int_single  0.372 0.372
# 9 Sample2  ext_multi   0.711 0.711
#10 Sample2  ext_single  0.537 0.537
# … with 42 more rows

如果我们需要删除“示例”中的数字

map2(lst1new, lst2new, ~ full_join(.x, .y) %>%
                         mutate(Source = str_remove(Source, "\\d+$")))

数据

lst1 <- list(structure(list(Moment.ext_multi.lane = c(0.7109148, 0.7109148, 
0.7109148, 0.7109148, 0.7109148, 0.7109148, 0.7109148, 0.7109148, 
0.7109148, 0.7109148, 0.7109148, 0.755, 0.7109148), Moment.ext_single.lane = c(0.5367121, 
0.5367121, 0.5367121, 0.5367121, 0.5367121, 0.5367121, 0.5367121, 
0.5367121, 0.5367121, 0.5367121, 0.5367121, NA, 0.5367121), Moment.int_multi.lane = c(0.5874249, 
0.5874249, 0.5874249, 0.5874249, 0.5874249, 0.5874249, 0.5874249, 
0.5874249, 0.5874249, 0.5874249, 0.5874249, 0.664, 0.5874249), 
    Moment.int_single.lane = c(0.3718993, 0.3718993, 0.3718993, 
    0.3718993, 0.3718993, 0.3718993, 0.3718993, 0.3718993, 0.3718993, 
    0.3718993, 0.3718993, 0.431, 0.3718993)), class = "data.frame", row.names = c("Baseline", 
"Sample1", "Sample2", "Sample3", "Sample4", "Sample5", "Sample6", 
"Sample7", "Sample8", "Sample9", "Sample10", "AASHTO", "Mean"
)), structure(list(Moment.ext_multi.lane = c(0.7109148, 0.7109148, 
0.7109148, 0.7109148, 0.7109148, 0.7109148, 0.7109148, 0.7109148, 
0.7109148, 0.7109148, 0.7109148, 0.755, 0.7109148), Moment.ext_single.lane = c(0.5367121, 
0.5367121, 0.5367121, 0.5367121, 0.5367121, 0.5367121, 0.5367121, 
0.5367121, 0.5367121, 0.5367121, 0.5367121, NA, 0.5367121), Moment.int_multi.lane = c(0.5874249, 
0.5874249, 0.5874249, 0.5874249, 0.5874249, 0.5874249, 0.5874249, 
0.5874249, 0.5874249, 0.5874249, 0.5874249, 0.664, 0.5874249), 
    Moment.int_single.lane = c(0.3718993, 0.3718993, 0.3718993, 
    0.3718993, 0.3718993, 0.3718993, 0.3718993, 0.3718993, 0.3718993, 
    0.3718993, 0.3718993, 0.431, 0.3718993)), class = "data.frame", row.names = c("Baseline", 
"Sample1", "Sample2", "Sample3", "Sample4", "Sample5", "Sample6", 
"Sample7", "Sample8", "Sample9", "Sample10", "AASHTO", "Mean"
)))

lst2 <- list(structure(list(Shear.ext_multi.lane = c(0.7109148, 0.7109148, 
0.7109148, 0.7109148, 0.7109148, 0.7109148, 0.7109148, 0.7109148, 
0.7109148, 0.7109148, 0.7109148, 0.755, 0.7109148), Shear.ext_single.lane = c(0.5367121, 
0.5367121, 0.5367121, 0.5367121, 0.5367121, 0.5367121, 0.5367121, 
0.5367121, 0.5367121, 0.5367121, 0.5367121, NA, 0.5367121), Shear.int_multi.lane = c(0.5874249, 
0.5874249, 0.5874249, 0.5874249, 0.5874249, 0.5874249, 0.5874249, 
0.5874249, 0.5874249, 0.5874249, 0.5874249, 0.664, 0.5874249), 
    Shear.int_single.lane = c(0.3718993, 0.3718993, 0.3718993, 
    0.3718993, 0.3718993, 0.3718993, 0.3718993, 0.3718993, 0.3718993, 
    0.3718993, 0.3718993, 0.431, 0.3718993)), class = "data.frame", row.names = c("Baseline", 
"Sample1", "Sample2", "Sample3", "Sample4", "Sample5", "Sample6", 
"Sample7", "Sample8", "Sample9", "Sample10", "AASHTO", "Mean"
)), structure(list(Shear.ext_multi.lane = c(0.7109148, 0.7109148, 
0.7109148, 0.7109148, 0.7109148, 0.7109148, 0.7109148, 0.7109148, 
0.7109148, 0.7109148, 0.7109148, 0.755, 0.7109148), Shear.ext_single.lane = c(0.5367121, 
0.5367121, 0.5367121, 0.5367121, 0.5367121, 0.5367121, 0.5367121, 
0.5367121, 0.5367121, 0.5367121, 0.5367121, NA, 0.5367121), Shear.int_multi.lane = c(0.5874249, 
0.5874249, 0.5874249, 0.5874249, 0.5874249, 0.5874249, 0.5874249, 
0.5874249, 0.5874249, 0.5874249, 0.5874249, 0.664, 0.5874249), 
    Shear.int_single.lane = c(0.3718993, 0.3718993, 0.3718993, 
    0.3718993, 0.3718993, 0.3718993, 0.3718993, 0.3718993, 0.3718993, 
    0.3718993, 0.3718993, 0.431, 0.3718993)), class = "data.frame", row.names = c("Baseline", 
"Sample1", "Sample2", "Sample3", "Sample4", "Sample5", "Sample6", 
"Sample7", "Sample8", "Sample9", "Sample10", "AASHTO", "Mean"
)))

推荐阅读