首页 > 解决方案 > 数据框:将由星号分隔的数据块从长格式改成狂野格式

问题描述

我正在处理一个数据表,该表讨论由单列组成的街道。每条街道都是可变长度的行块 第一行包含街道名称,其他行包含各种详细信息 每条街道与另一条街道之间由一个包含 4 颗星的单元格隔开。如何重新组织我的数据?

dataset <- c("Rosa street", "London", "From to", "Description : lorem ipsum", "****", "Main street", "Bristol", "From to", "Description : dolor sit amet", "coordinates", "****"

 dataset <- as.data.frame(dataset)

这是我的结果:

              streets
1                Rosa street
2                     London
3                    From to
4  Description : lorem ipsum
5                       ****
6                Main street
7                    Bristol
8                    From to
9  Description : lorem ipsum
10               coordinates
11                      ****

预期产出

     var1        |var2      |var3    |var4                           |var5         |
------------------------------------------------------------------------------------
1    Rosa street | London  | From to |  Description : lorem ipsum    |NA           | 
2    Main street | Bristol | From to |  Description : dolor sit amet | coordinates |

标签: r

解决方案


这是使用tidyverse-

library(dplyr)
library(tidyr)

dataset %>%
  group_by(grp = lag(cumsum(dataset == '****'), default = 0)) %>%
  mutate(row = row_number()) %>%
  ungroup %>%
  filter(dataset != '****') %>%
  pivot_wider(names_from = row, values_from = dataset, names_prefix = 'var')%>%
  select(-grp)

#  var1        var2    var3    var4                         var5       
#  <chr>       <chr>   <chr>   <chr>                        <chr>      
#1 Rosa street London  From to Description : lorem ipsum    NA         
#2 Main street Bristol From to Description : dolor sit amet coordinates

推荐阅读