首页 > 解决方案 > how to separate column into two columns based on a condition

问题描述

I have a dataset that consists of measurements of corals. Along with each measurement, additional metadata were collected including the position or "Location" of the colony on experimental modules. I am attempting to separate the Location column in my data frame into both horizontal and vertical components. Each Location code is an alphanumeric entry in which the letter represents the column (A-D) and the numeric portion represents the row (1-4).

There are a number of instances where the coral was at the margin of a next row (eg. A1_2) or a next column (eg. A_B1) such that the format of the entry changes from a letter and a number to one letter and two numbers or two letters and a number.

d <- structure(list(`Module #` = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L), .Label = c("111", "112", "113", "114", "115", 
"116", "211", "212", "213", "214", "215", "216"), class = "factor"), 
    Side = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
    ), .Label = c("N", "S", "T"), class = "factor"), TimeStep = c(4L, 
    4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), Location = c("A1", "A1_2", 
    "A2", "A3", "A3_4", "A4", "B_C3", "B1", "B1_2", "B2"), Date = structure(c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_), class = "Date"), Year = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("17", "18"
    ), class = "factor"), Site = structure(c(NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_), .Label = c("HAN", 
    "WAI"), class = "factor"), Treatment = c(NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_
    ), recruits = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Site_long = structure(c(2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Hanauma Bay", 
    "Waikiki"), class = "factor"), Shelter = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("High", "Low"
    ), class = "factor")), row.names = c(NA, 10L), class = "data.frame")

head(d)

I would like to end up with a data frame with 2 new columns: one column named "Column" and one named "Row". "Column" refers to the lettered portion of the code from Location and "Row" refers to the numbered portion. Note that each column value should be either 1 or 3 characters (eg. Column = A for A1_2 or Column = A_B for A_B1).

标签: rfilterdplyr

解决方案


We can use str_extract to extract values individually

library(tidyverse)
d %>%
  mutate(Column = str_extract(Location, "[A-Z]_?[A-Z]?"), 
         Row = str_extract(Location, "[0-9]_?[0-9]?")) %>%
  select(Location, Column, Row)

#   Location Column Row
#1        A1      A   1
#2      A1_2      A 1_2
#3        A2      A   2
#4        A3      A   3
#5      A3_4      A 3_4
#6        A4      A   4
#7      B_C3    B_C   3
#8        B1      B   1
#9      B1_2      B 1_2
#10       B2      B   2

Or use tidyr::extract to separate the columns together in one regex

d %>%
   extract(Location, into = c("Column", "Row"), 
           regex = "([A-Z]_?[A-Z]?)([0-9]_?[0-9]?)")

We can use base R sub to extract the values using similar regex

d$Column <- sub("([A-Z]_?[A-Z]?).*", "\\1", d$Location)
d$Row <- sub("[A-Z]_?[A-Z]?([0-9]_?[0-9]?)", "\\1", d$Location)

推荐阅读