首页 > 解决方案 > 如何折叠键 id 上的行但将唯一测量值保留在新变量中?

问题描述

我有一个如下所示的数据集:

                                Federal.Area State Total_Miles
1                          Allentown, PA--NJ    NJ     1094508
2                          Allentown, PA--NJ    PA     9957805
3            Augusta-Richmond County, GA--SC    GA     6221747
4            Augusta-Richmond County, GA--SC    SC     2101823
5                             Beloit, WI--IL    IL      324238
6                             Beloit, WI--IL    WI      542491

我想折叠行,Federal.Area但创建并保留包含唯一性State和唯一性的新变量Total_Miles,使其看起来像这样:

Federal.Area          State    Total_Miles State1 State2 Total_Miles_state1 Total_Miles_state2
   <fct>                 <fct>       <dbl> <fct>  <fct>              <dbl>            <dbl>
 1 Allentown, PA--NJ     NJ        1094508 NJ     PA               1094508          9957805
 2 Augusta-Richmond Cou… GA        6221747 GA     SC               6221747          2101823
 3 Beloit, WI--IL        IL         324238 IL     WI                324238           542491

我不知道如何将变量折叠StateTotal_Miles同一行,但作为新变量键入Federal.Area.

标签: rdplyrtidyverse

解决方案


也许您可以使用pivot_widerfromtidyverse将数据转换为宽格式。

首先将每个行中的行编号Federal.Area为 1 和 2。然后调用pivot_widerwhich 将追加

library(tidyverse)

df %>%
  group_by(Federal.Area) %>%
  mutate(rn = row_number()) %>%
  pivot_wider(id_cols = Federal.Area, values_from = c(State, Total_Miles), names_from = rn) 

输出

# A tibble: 3 x 5
# Groups:   Federal.Area [3]
  Federal.Area                  State_1 State_2 Total_Miles_1 Total_Miles_2
  <chr>                         <chr>   <chr>           <int>         <int>
1 Allentown,PA--NJ              NJ      PA            1094508       9957805
2 Augusta-RichmondCounty,GA--SC GA      SC            6221747       2101823
3 Beloit,WI--IL                 IL      WI             324238        542491

数据

df <- structure(list(Federal.Area = c("Allentown,PA--NJ", "Allentown,PA--NJ", 
"Augusta-RichmondCounty,GA--SC", "Augusta-RichmondCounty,GA--SC", 
"Beloit,WI--IL", "Beloit,WI--IL"), State = c("NJ", "PA", "GA", 
"SC", "IL", "WI"), Total_Miles = c(1094508L, 9957805L, 6221747L, 
2101823L, 324238L, 542491L)), class = "data.frame", row.names = c(NA, 
-6L))

推荐阅读