r - how to separate column into two columns based on a condition
问题描述
I have a dataset that consists of measurements of corals. Along with each measurement, additional metadata were collected including the position or "Location" of the colony on experimental modules. I am attempting to separate the Location column in my data frame into both horizontal and vertical components. Each Location code is an alphanumeric entry in which the letter represents the column (A-D) and the numeric portion represents the row (1-4).
There are a number of instances where the coral was at the margin of a next row (eg. A1_2) or a next column (eg. A_B1) such that the format of the entry changes from a letter and a number to one letter and two numbers or two letters and a number.
d <- structure(list(`Module #` = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L), .Label = c("111", "112", "113", "114", "115",
"116", "211", "212", "213", "214", "215", "216"), class = "factor"),
Side = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), .Label = c("N", "S", "T"), class = "factor"), TimeStep = c(4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), Location = c("A1", "A1_2",
"A2", "A3", "A3_4", "A4", "B_C3", "B1", "B1_2", "B2"), Date = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_), class = "Date"), Year = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("17", "18"
), class = "factor"), Site = structure(c(NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_), .Label = c("HAN",
"WAI"), class = "factor"), Treatment = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_
), recruits = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Site_long = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Hanauma Bay",
"Waikiki"), class = "factor"), Shelter = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("High", "Low"
), class = "factor")), row.names = c(NA, 10L), class = "data.frame")
head(d)
I would like to end up with a data frame with 2 new columns: one column named "Column" and one named "Row". "Column" refers to the lettered portion of the code from Location and "Row" refers to the numbered portion. Note that each column value should be either 1 or 3 characters (eg. Column = A for A1_2 or Column = A_B for A_B1).
解决方案
We can use str_extract
to extract values individually
library(tidyverse)
d %>%
mutate(Column = str_extract(Location, "[A-Z]_?[A-Z]?"),
Row = str_extract(Location, "[0-9]_?[0-9]?")) %>%
select(Location, Column, Row)
# Location Column Row
#1 A1 A 1
#2 A1_2 A 1_2
#3 A2 A 2
#4 A3 A 3
#5 A3_4 A 3_4
#6 A4 A 4
#7 B_C3 B_C 3
#8 B1 B 1
#9 B1_2 B 1_2
#10 B2 B 2
Or use tidyr::extract
to separate the columns together in one regex
d %>%
extract(Location, into = c("Column", "Row"),
regex = "([A-Z]_?[A-Z]?)([0-9]_?[0-9]?)")
We can use base R sub
to extract the values using similar regex
d$Column <- sub("([A-Z]_?[A-Z]?).*", "\\1", d$Location)
d$Row <- sub("[A-Z]_?[A-Z]?([0-9]_?[0-9]?)", "\\1", d$Location)
推荐阅读
- java - 如何将大文件下载到字节数组中
- c# - 使用值和引用类型的泛型协方差解决方法
- spring-mvc - Struts 迁移到 Spring MVC - 通过 Spring Boot
- node.js - 在不安装的情况下取消链接 npm 链接
- java - '没有这样的表' SqliteDatabase
- gunicorn - Gunicorn + Web2py HTTPS
- javascript - 一种从不在 Provider 内部的组件修改 redux 存储的方法?
- php - PHP file_get_contents 和其他返回 404 但 URL 可通过浏览器访问
- javascript - Date.setDate() 每次在函数中增加日期
- ant - Ant Junit5:org.junit.platform.commons.util.PreconditionViolationException:无法加载具有名称的类