首页 > 解决方案 > 如何为数据中的不同组添加单独的行号列?

问题描述

数据

我在这个可重复的例子中使用了 palmer penguins 数据。举个简单的例子,我保留了 2007 年和物种的数据Adelie

install.packages("palmerpenguins")
library(palmerpenguins)
library(tidyverse)

ad <- penguins %>% 
  filter(species == "Adelie",
         year == 2007) %>% 
  select(species, island)

目标和我尝试过的

我想为数据中的每个行号创建单独的列island。有3个岛屿:

> unique(ad$island)
[1] Torgersen Biscoe    Dream    
Levels: Biscoe Dream Torgersen

所以,应该有 3 行 id 列。

我已经完成了以下工作:

ad %>% 
  add_count(island, name = "island_counts") %>% 
  group_by(island) %>% 
  mutate(row_id = 0:(unique(island_counts)-1),
         row_id_Torgersen =  if_else(island == "Torgersen", row_id, NA_integer_),
         row_id_Biscoe =  if_else(island == "Biscoe", row_id, NA_integer_),
         row_id_Dream =  if_else(island == "Dream" , row_id, NA_integer_)
    
  ) %>% 
  ungroup()  

# A tibble: 50 x 7
   species island    island_counts row_id row_id_Torgersen row_id_Biscoe row_id_Dream
   <fct>   <fct>             <int>  <int>            <int>         <int>        <int>
 1 Adelie  Torgersen            20      0                0            NA           NA
 2 Adelie  Torgersen            20      1                1            NA           NA
 3 Adelie  Torgersen            20      2                2            NA           NA
 4 Adelie  Torgersen            20      3                3            NA           NA
 5 Adelie  Torgersen            20      4                4            NA           NA
 6 Adelie  Torgersen            20      5                5            NA           NA
 7 Adelie  Torgersen            20      6                6            NA           NA
 8 Adelie  Torgersen            20      7                7            NA           NA
 9 Adelie  Torgersen            20      8                8            NA           NA
10 Adelie  Torgersen            20      9                9            NA           NA
# ... with 40 more rows  

但我想以编程方式执行此操作,以便新列的数量不是硬编码,而是根据islands. 有更好的解决方案吗?谢谢你。

标签: rdplyr

解决方案


试试这个。您可以复制 island 变量,然后使用pivot_wider(). 通过这种方式,您可以避免创建单独的条件:

library(palmerpenguins)
library(tidyverse)

ad <- penguins %>% 
  filter(species == "Adelie",
         year == 2007) %>% 
  select(species, island)
#Code
ad %>% 
  add_count(island, name = "island_counts") %>% 
  group_by(island) %>% 
  mutate(row_id = 0:(unique(island_counts)-1),name=paste0('row_id_',island),
         id=row_number()) %>%
  ungroup() %>%
  pivot_wider(names_from = name,values_from=row_id)%>%
  select(-id)

输出:

# A tibble: 50 x 6
   species island    island_counts row_id_Torgersen row_id_Biscoe row_id_Dream
   <fct>   <fct>             <int>            <int>         <int>        <int>
 1 Adelie  Torgersen            20                0            NA           NA
 2 Adelie  Torgersen            20                1            NA           NA
 3 Adelie  Torgersen            20                2            NA           NA
 4 Adelie  Torgersen            20                3            NA           NA
 5 Adelie  Torgersen            20                4            NA           NA
 6 Adelie  Torgersen            20                5            NA           NA
 7 Adelie  Torgersen            20                6            NA           NA
 8 Adelie  Torgersen            20                7            NA           NA
 9 Adelie  Torgersen            20                8            NA           NA
10 Adelie  Torgersen            20                9            NA           NA
# ... with 40 more rows

推荐阅读