首页 > 解决方案 > How to split strings for a tibble

问题描述

I am trying to do a data project with sports teams and I was wondering if there was a way that I could take plain text and make it into a tibble with the data separated into city and the mascot.

tibble("City, Mascot,
  Arizona Diamondbacks
  Atlanta Braves
  Baltimore Orioles
  Boston Red Sox
  Chicago White Sox
  Chicago Cubs
  Cincinnati Reds
  Cleveland Indians
  Colorado Rockies
  Detroit Tigers
  Houston Astros
  Kansas City Royals
  Los Angeles Angels
  Los Angeles Dodgers
  Miami Marlins
  Milwaukee Brewers
  Minnesota Twins
  New York Yankees
  New York Mets
  Oakland Athletics
  Philadelphia Phillies
  Pittsburgh Pirates
  San Diego Padres
  San Francisco Giants
  Seattle Mariners
  St. Louis Cardinals
  Tampa Bay Rays
  Texas Rangers
  Toronto Blue Jays
  Washington Nationals
  "
)

Basically being able to edit the code so that I don't have to manually change each one by hand but I can make small adjustments if necessary. I am doing this so that I can join it with other data by city.

标签: r

解决方案


Some regex black magic

library(tidyverse)


example_data <- tibble::tribble(
                    ~data,
   "Arizona Diamondbacks",
         "Atlanta Braves",
      "Baltimore Orioles",
         "Boston Red Sox",
      "Chicago White Sox",
           "Chicago Cubs",
        "Cincinnati Reds",
      "Cleveland Indians",
       "Colorado Rockies",
         "Detroit Tigers",
         "Houston Astros",
     "Kansas City Royals",
     "Los Angeles Angels",
    "Los Angeles Dodgers",
          "Miami Marlins",
      "Milwaukee Brewers",
        "Minnesota Twins",
       "New York Yankees",
          "New York Mets",
      "Oakland Athletics",
  "Philadelphia Phillies",
     "Pittsburgh Pirates",
       "San Diego Padres",
   "San Francisco Giants",
       "Seattle Mariners",
    "St. Louis Cardinals",
         "Tampa Bay Rays",
          "Texas Rangers",
      "Toronto Blue Jays",
   "Washington Nationals"
  )


example_data |> 
  mutate(city = str_remove(data,'[[:alpha:]]+$') |> str_trim(),
         macot = str_extract(data,'[[:alpha:]]+$'))
#> # A tibble: 30 x 3
#>    data                 city          macot       
#>    <chr>                <chr>         <chr>       
#>  1 Arizona Diamondbacks Arizona       Diamondbacks
#>  2 Atlanta Braves       Atlanta       Braves      
#>  3 Baltimore Orioles    Baltimore     Orioles     
#>  4 Boston Red Sox       Boston Red    Sox         
#>  5 Chicago White Sox    Chicago White Sox         
#>  6 Chicago Cubs         Chicago       Cubs        
#>  7 Cincinnati Reds      Cincinnati    Reds        
#>  8 Cleveland Indians    Cleveland     Indians     
#>  9 Colorado Rockies     Colorado      Rockies     
#> 10 Detroit Tigers       Detroit       Tigers      
#> # ... with 20 more rows

Created on 2021-10-18 by the reprex package (v2.0.1)


推荐阅读