r - 多个列上的字符串匹配,具有特定字符串和关联名称中的值比较
问题描述
我有兴趣在一系列列中进行字符串检测和值比较。如果在列中找到字符串(在本例中为ZSD
),则需要比较它们在另一列中的对应值。
输入
我的输入如下:
a.zsd a.test b.zsd b.test c.zsd c.test d.zsd d.test
'ZSD' 0.0 'ZAD' 1.0 NA 0.5 'ZAD' 1.0
'ZAD' 1.0 NA 0.0 NA 0.5 'ZSD' 0.0
NA 0.5 NA 0.5 'ZAD' 0.5 NA 0.5
'Not Achieved ZSD' 0.0 NA 0.5 'ZAD' 0.5 NA 0.5
'ZSD' 1.0 'ZSD' 0.5 NA 0.5 'ZSD' 0.0
NA 0.0 NA 0.0 NA 0.5 NA 0.0
NA 1.0 'ZSD' 0.0 'ZSD' 0.5 'ZSD' 1.0
输出
在我的输出中,我想要两个额外的列smallest.test
和zsd.level
:
a.zsd a.test b.zsd b.test c.zsd c.test d.zsd d.test smallest.test zsd.level
'ZSD' 0.0 'ZAD' 1.0 NA 0.5 'ZAD' 1.0 0.0 a
'ZAD' 1.0 NA 0.0 NA 0.5 'ZSD' 0.0 0.0 d
NA 0.5 NA 0.5 'ZAD' 0.5 NA 0.5 0.0 NA
'Not Achieved ZSD' 0.0 NA 0.5 'ZAD' 0.5 NA 0.5 0.0 a
'ZSD' 1.0 'ZSD' 0.5 NA 0.5 'ZSD' 0.0 0.0 d
NA 0.0 NA 0.0 NA 0.5 NA 0.0 0.0 NA
NA 1.0 'ZSD' 0.0 'ZSD' 0.5 'ZSD' 1.0 0.0 b
信息:
我的数据框有一百多列。我只对名称以字符串结尾的某些列感兴趣.zsd
。这些列可以具有NA
以下字符串值ZAD
、ZSD
、或其中之一Not Achieved ZSD
。具有字符串名称的每一列.zsd
都有一个关联的.test
列。
要求
我想要输出中的两个新列smallest.test
和zsd.level
. 要求如下:
遍历以字符串结尾的列名
.zsd
在这些列中检测字符串
ZSD
如果
ZSD
仅在其中一列中找到该字符串,则在输出列中返回该列的名称,并从要返回到输出列zsd.level
的列名称中返回对应的值。.test
smallest.test
如果没有任何列包含该字符串
ZSD
,则在输出列中返回 NA,zsd.level
并在相应的输出列中返回 0.0smallest.test
。如果多列包含 string
ZSD
,则选择对应.test
列中值最小的列并在输出中返回。如果多个列包含 string
ZSD
,并且它们都具有相同.test
列的相应值,则选择输出的最后一列名称和输出的相应值.test
。
输入()
dput(df)
structure(list(a.zsd = c("ZSD", "ZAD", NA, "Not Achieved ZSD", "ZSD", NA, NA),
a.test = c(0, 1, 0.5, 0, 1, 0, 1),
b.zsd = c("ZAD", NA, NA, NA, "ZSD", NA, "ZSD"),
b.test = c(1, 0, 0.5, 0.5, 0.5, 0, 0),
c.zsd = c(NA, NA, "ZAD", "ZAD", NA, NA, "ZSD"),
c.test = c(0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5),
d.zsd = c("ZAD", "ZSD", NA, NA, "ZSD", NA, "ZSD"),
d.test = c(1, 0, 0.5, 0.5, 0, 0, 1)),
class = "data.frame", row.names = c(NA, -7L))
部分解决方案
基于以下帖子:String matching over multiple columns with specific string names,此代码可以迭代并选择.zsd
列并返回输出中最高的列名。但它没有考虑到.test
字段的对应值。对此的任何帮助将不胜感激。
library(dplyr)
library(tidyr)
library(stringr)
df %>%
mutate(across(contains("zsd"), ~case_when(str_detect(., "ZSD") ~ cur_column()), .names = 'new_{col}')) %>%
unite(zsd_level, starts_with('new'), na.rm = TRUE, sep = ' ') %>%
mutate(zsd_level = str_remove_all(zsd_level, ".zsd"),
zsd_level = str_sub(zsd_level, -1))
解决方案
这是一个base
涉及which.min
假定相应的“.test”和“.zsd”列相邻的解决方案。
第一个代码块可用于此目的,但如果给定行与最小值有关联,它将给出第一个实例。下面是一个函数,可用于选择最小值的第一个或最后一个实例。
# sample data
df <- structure(list(a.zsd = c("ZSD", "ZAD", NA, "Not Achieved ZSD", "ZSD", NA, NA),
a.test = c(0, 1, 0.5, 0, 1, 0, 1),
b.zsd = c("ZAD", NA, NA, NA, "ZSD", NA, "ZSD"),
b.test = c(1, 0, 0.5, 0.5, 0.5, 0, 0),
c.zsd = c(NA, NA, "ZAD", "ZAD", NA, NA, "ZSD"),
c.test = c(0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5),
d.zsd = c("ZAD", "ZSD", NA, NA, "ZSD", NA, "ZSD"),
d.test = c(1, 0, 0.5, 0.5, 0, 0, 1)),
class = "data.frame", row.names = c(NA, -7L))
# select .zsd columns
zsd_cols <- grep(".zsd", names(df), value = TRUE)
zsd_df <- df[, zsd_cols]
# select .test columns
test_cols <- gsub("zsd", "test",zsd_cols)
test_df <- df[, test_cols]
# convert "Not Achieved ZSD" to "ZSD"
zsd_df[zsd_df == "Not Achieved ZSD" ] <- "ZSD"
# assign NA to non "ZSD" cells
zsd_df[zsd_df != "ZSD"] <- NA
# assign 999 test_df values whose corresponding zsd_df is NA
test_df[is.na(zsd_df)] <- 999
# return cols which hold minimum
nams <- names(test_df)[apply(test_df, 1 ,which.min)]
# scrub .test suffix
nams <- gsub(".test", "", nams)
# return mins
mins <- apply(test_df, 1 ,min)
# assign values less than 999 as smallest test, or zero
df$smallest.test <- ifelse(mins < 999, mins, 0)
# assign name if corresponding min less than 999 or NA
df$zsd_level <- ifelse(mins < 999, nams, NA)
> df
a.zsd a.test b.zsd b.test c.zsd c.test d.zsd d.test smallest.test zsd_level
1 ZSD 0.0 ZAD 1.0 <NA> 0.5 ZAD 1.0 0 a
2 ZAD 1.0 <NA> 0.0 <NA> 0.5 ZSD 0.0 0 d
3 <NA> 0.5 <NA> 0.5 ZAD 0.5 <NA> 0.5 0 <NA>
4 Not Achieved ZSD 0.0 <NA> 0.5 ZAD 0.5 <NA> 0.5 0 a
5 ZSD 1.0 ZSD 0.5 <NA> 0.5 ZSD 0.0 0 d
6 <NA> 0.0 <NA> 0.0 <NA> 0.5 <NA> 0.0 0 <NA>
7 <NA> 1.0 ZSD 0.0 ZSD 0.5 ZSD 1.0 0 b
编辑
相同的想法,但是当领带连续出现时可以获得第一个或最后一个最小值的函数:
# adjusted sample data
df <- data.frame(a.zsd = c("ZSD", "ZAD", NA,
"Not Achieved ZSD", "ZSD", NA, NA),
a.test = c(0, 1, 0.5, 0, 1, 0, 1),
b.zsd = c("ZAD", NA, NA, NA, "ZSD", NA, "ZSD"),
b.test = c(1, 0, 0.5, 0.5, 0.0, 0, 0), # adjusted
c.zsd = c(NA, NA, "ZAD", "ZAD", NA, NA, "ZSD"),
c.test = c(0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5),
d.zsd = c("ZAD", "ZSD", NA, NA, "ZSD", NA, "ZSD"),
d.test = c(1, 0, 0.5, 0.5, 0, 0, 1))
appendMin <- function(df, last_min = TRUE){
# select .zsd columns
zsd_cols <- grep(".zsd", names(df), value = TRUE)
zsd_df <- df[, zsd_cols]
if(last_min) { zsd_df <- rev(zsd_df) } # for last min
# select .test columns
test_cols <- gsub("zsd", "test",zsd_cols)
test_df <- df[, test_cols]
if(last_min) { test_df <- rev(test_df) } # for last min
# convert "Not Achieved ZSD" to "ZSD"
zsd_df[zsd_df == "Not Achieved ZSD" ] <- "ZSD"
# assign NA to non "ZSD" cells
zsd_df[zsd_df != "ZSD" ] <- NA
# assign 999 test_df values whose corresponding zsd_df is NA
test_df[is.na(zsd_df)] <- 999
# return cols which hold the first minimum
nams <- names(test_df)[apply(test_df, 1 ,which.min)]
# scrub .test suffix
nams <- gsub(".test", "", nams)
# return mins
mins <- apply(test_df, 1 ,min)
# assign values less than 999 as smallest test, or zero
df$smallest.test <- ifelse(mins < 999, mins, 0)
# assign name if corresponding min less than 999 or NA
df$zsd_level <- ifelse(mins < 999, nams, NA)
return(df)
}
ties_first <- appendMin(df, last_min = FALSE)
ties_last <- appendMin(df, last_min = TRUE)
推荐阅读
- flutter - 如何使用flutter_moor获取表中的条目数?
- php - 如何在 laravel 中启用 ssl 保护
- java - Statement.setQueryTimeout 不适用于 Oracle 18c jdbc 驱动程序
- vb.net - vb.net在以最大化形式绘制线条(使用XNA)时避免放大图片框
- python - 拥抱脸:NameError:未定义名称“句子”
- c++ - QT - Lambda-Capture by reference - qmlRegisterSingletonType
- pytest - Pytest:将多个参数传递给夹具
- r - Global Moran's I in R 使用 spdep 包
- python-3.x - Tkinter 条件格式:如果先前的值不相等,则更改颜色?
- java - 如何在spring boot中发送多个requestparams