首页 > 解决方案 > Create a dummy to indicating presence of string fragment in any of multiple variables

问题描述

df <- data.frame (address.1.line = c("apartment 5", "25 spring street", "nice house"), address.2.line = c("london", "new york", "apartment 2"), address.3.line = c("", "", "paris"))

I'm trying to make a function that returns a new column in a data frame. The column should be a dummy variable attached to the original data frame indicating whether any of 3 address-line variables contain a string (or selection of strings).

E.g., in the example above, I want df to have a new variable called "Apartment_dummy" indicating the presence of the string fragment "apartment" in any of the three address lines---so it will take 1 in rows 1 and 3, and zero in row 0. The function needs to take 2 arguments, therefore: the name of the new dummy variable to be created, and the corresponding string fragment that needs to be detected in the address variables.

I'd tried the following. It will return a dummy, but won't give the new variable the right name. Also, I feel like there must be a way to do it in a single step. Any ideas? Many thanks!

library(tidyverse)
premises_dummy <- function(varname = NULL, strings = NULL) {
df %<>%    mutate_at(.funs = funs(flagA = str_detect(., strings)), .vars = vars(ends_with(".line"))) %>% 
       mutate(varname = ifelse(rowSums(select(., contains("flagA"))) > 0, 1, 0))
return(df)
}

df <- premises_dummy(varname = 'Apartment_dummy', strings = 'apartment')

标签: rtidyversestringr

解决方案


A quick data.table solution to it:

library(data.table)
dt <- data.table(df)
search_string <- "apartment"
dt[like(address.1.line, search_string)| 
   like(address.2.line, search_string)| 
   like(address.3.line, search_string), paste0(search_string,".Dummy") := 1]

dt[is.na(get(paste0(search_string,".Dummy"))), paste0(search_string,".Dummy") := 0]

推荐阅读