首页 > 解决方案 > 使用列表和用户生成的在多个变量和数据集上运行特定命令

问题描述

我想使用列表和用户生成的在多个变量和数据集上运行特定命令。

例如,我想在 3 个不同的数据集 、 和 上使用 R 中的命令将、和变量table转换cutcolor因子,并将结果放入 3 个新的和用户指定的数据集,称为、和。as.factor(as.character())diamondsdiamonds_bottom300diamonds_top300diamonds_postdiamonds_bottom300_postdiamonds_top300_post

我可以做到这一点很长的路要走:

## long way to turn data into factors

### individually

#### for diamonds dataset
diamonds_post$table <- as.factor(as.character(diamonds$table))
diamonds_post$cut <- as.factor(as.character(diamonds$cut))
diamonds_post$color <- as.factor(as.character(diamonds$color))

#### for diamonds_bottom300 dataset
diamonds_bottom300_post$table <- as.factor(as.character(diamonds_bottom300$table))
diamonds_bottom300_post$cut <- as.factor(as.character(diamonds_bottom300$cut))
diamonds_bottom300_post$color <- as.factor(as.character(diamonds_bottom300$color))

#### for diamonds_top300 dataset
diamonds_top300_post$table <- as.factor(as.character(diamonds_top300$table))
diamonds_top300_post$cut <- as.factor(as.character(diamonds_top300$cut))
diamonds_top300_post$color <- as.factor(as.character(diamonds_top300$color))

## gives str of datasets
str(diamonds_post)
str(diamonds_top300_post)
str(diamonds_top300_post)

> ## gives str of datasets
> str(diamonds_post)
'data.frame':   53940 obs. of  10 variables:
 $ carat  : num  0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
 $ cut    : Factor w/ 5 levels "Fair","Good",..: 3 4 2 4 2 5 5 5 1 5 ...
 $ color  : Factor w/ 7 levels "D","E","F","G",..: 2 2 2 6 7 7 6 5 2 5 ...
 $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
 $ depth  : num  61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
 $ table  : Factor w/ 127 levels "43","44","49",..: 31 91 116 61 61 51 51 31 91 91 ...
 $ price  : int  326 326 327 334 335 336 336 337 337 338 ...
 $ x      : num  3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
 $ y      : num  3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
 $ z      : num  2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
> str(diamonds_top300_post)
'data.frame':   327 obs. of  10 variables:
 $ carat  : num  0.23 0.86 0.84 0.7 0.76 0.57 0.74 0.91 0.98 0.71 ...
 $ cut    : Factor w/ 3 levels "Fair","Good",..: 2 1 1 1 1 1 1 1 1 1 ...
 $ color  : Factor w/ 7 levels "D","E","F","G",..: 2 2 4 4 4 2 3 5 2 1 ...
 $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 5 2 3 7 5 7 4 2 2 4 ...
 $ depth  : num  56.9 55.1 55.1 58.8 59 58.7 61.1 61.3 53.3 56.9 ...
 $ table  : Factor w/ 12 levels "65","65.4","66",..: 1 6 4 3 7 3 5 4 4 1 ...
 $ price  : int  327 2757 2782 2797 2800 2805 2805 2825 2855 2858 ...
 $ x      : num  4.05 6.45 6.39 5.81 5.89 5.34 5.82 6.24 6.82 5.89 ...
 $ y      : num  4.07 6.33 6.2 5.9 5.8 5.43 5.75 6.19 6.74 5.84 ...
 $ z      : num  2.31 3.52 3.47 3.44 3.46 3.16 3.53 3.81 3.61 3.34 ...
> str(diamonds_top300_post)
'data.frame':   327 obs. of  10 variables:
 $ carat  : num  0.23 0.86 0.84 0.7 0.76 0.57 0.74 0.91 0.98 0.71 ...
 $ cut    : Factor w/ 3 levels "Fair","Good",..: 2 1 1 1 1 1 1 1 1 1 ...
 $ color  : Factor w/ 7 levels "D","E","F","G",..: 2 2 4 4 4 2 3 5 2 1 ...
 $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 5 2 3 7 5 7 4 2 2 4 ...
 $ depth  : num  56.9 55.1 55.1 58.8 59 58.7 61.1 61.3 53.3 56.9 ...
 $ table  : Factor w/ 12 levels "65","65.4","66",..: 1 6 4 3 7 3 5 4 4 1 ...
 $ price  : int  327 2757 2782 2797 2800 2805 2805 2825 2855 2858 ...
 $ x      : num  4.05 6.45 6.39 5.81 5.89 5.34 5.82 6.24 6.82 5.89 ...
 $ y      : num  4.07 6.33 6.2 5.9 5.8 5.43 5.75 6.19 6.74 5.84 ...
 $ z      : num  2.31 3.52 3.47 3.44 3.46 3.16 3.53 3.81 3.61 3.34 ...

我尝试创建一个用户生成的函数来执行此任务,以及一个相应的列表:

### creates function to turn into numeric form
function_turn_dataset_variable_into_factor_form <- 
  # ---- NOTE: turns variable into sum contrasted version of variable
  # ---- NOTE: variable_name ==  variable to be turned to sum contrast
  # ---- NOTE: dataset_name == dataset that contains variable name
  # ---- NOTE: generally speaking, procedure is to create new variable with "_c" as suffix for corresponding sum contrasted variable
  function(variable_name, dataset_name)
  {
    # ---- NOTE: # changes variable_name and dataset_name to object
    colmn1 <- variable_name
    nm1 <- dataset_name
    # ---- NOTE: inserts dataset into function
    dataset_funct_object_A <- 
      data.frame(
        get(nm1)
      )
    # ---- NOTE: transdorms data into factor form
    dataset_funct_object_A[[colmn1]] <- as.factor(as.character(dataset_funct_object_A[[colmn1]]))
    # ---- NOTE: returns appropriate object
    return(dataset_funct_object_A)
  }


# ---- NOTE: dataset with lists of corresponding variables/dfs
variable_to_become_factors_in_specific_datasets
# A tibble: 9 x 3
  variable_to_become_factors datasets_to_become_factors datasets_post          
  <chr>                      <chr>                      <chr>                  
1 color                      diamonds                   diamonds_post          
2 color                      diamonds_bottom300         diamonds_bottom300_post
3 color                      diamonds_top300            diamonds_top300_post   
4 cut                        diamonds                   diamonds_post          
5 cut                        diamonds_bottom300         diamonds_bottom300_post
6 cut                        diamonds_top300            diamonds_top300_post   
7 table                      diamonds                   diamonds_post          
8 table                      diamonds_bottom300         diamonds_bottom300_post
9 table                      diamonds_top300            diamonds_top300_post   

当我单独使用它时它确实有效,尽管使用该功能并没有比我使用长途时更快。

### runs user generated function on 1 variable/dataset
# ---- NOTE: gives structure of data
str(diamonds_post$color)
# ---- NOTE: runs function
diamonds_post <- function_turn_dataset_variable_into_factor_form(variable_to_become_factors_in_specific_datasets$variable_to_become_factors[1],variable_to_become_factors_in_specific_datasets$datasets_to_become_factors[1])
# ---- NOTE: gives structure of data
str(diamonds_post$color)
# ---- NOTE: works
# ---- NOTE: not really much faster than the long way

当我使用 mapply() 将它应用到列表时,我无法真正让它以我想要的方式工作。有没有办法让这个任务使用用户生成的函数来工作,该函数将转换后的变量返回到与起始数据集不同的对应用户特定数据集?

提前感谢您的帮助。



这是用于示例的代码:


# Loads packages
# ---- NOTE: making plots and diamonds dataset
if(!require(ggplot2)){install.packages("ggplot2")}
# ---- NOTE: run mixed effects models
if(!require(lme4)){install.packages("lme4")}
# ---- NOTE: for data wrangling
if(!require(dplyr)){install.packages("dplyr")}




# dataset creation

## for dataset with top 300 rows
# ---- NOTE: selects only the top 300 rows of the dataset
diamonds_top300 <- data.frame(dplyr::top_n(diamonds, 300, table))
# ---- NOTE: gives dataset info
head(diamonds_top300)
str(diamonds_top300)
colnames(diamonds_top300)
nrow(diamonds_top300)
# ---- NOTE: gives unique values of Fixed and Random effects, and dvs
unique(diamonds_top300$price)
unique(diamonds_top300$y)
unique(diamonds_top300$cut)
unique(diamonds_top300$color)
unique(diamonds_top300$carat)
unique(diamonds_top300$clarity)
unique(diamonds_top300$depth)
unique(diamonds_top300$table)


## for dataset with bottom 300 rows
### dataset
# ---- NOTE: selects only the bottom 300 rows of the dataset
diamonds_bottom300 <- data.frame(dplyr::top_n(diamonds, -300, table))
# ---- NOTE: gives dataset info
head(diamonds_bottom300)
str(diamonds_bottom300)
colnames(diamonds_bottom300)
nrow(diamonds_bottom300)
# ---- NOTE: gives unique values of Fixed and Random effects, and dvs
unique(diamonds_bottom300$price)
unique(diamonds_bottom300$y)
unique(diamonds_bottom300$cut)
unique(diamonds_bottom300$color)
unique(diamonds_bottom300$carat)
unique(diamonds_bottom300$clarity)
unique(diamonds_bottom300$depth)
unique(diamonds_bottom300$table)

### creates end result variables
diamonds_post <- data.frame(diamonds_bottom300)
diamonds_top300_post <- data.frame(diamonds_top300)
diamonds_bottom300_post <- data.frame(diamonds_bottom300)


# turns variables into factor for using as.factor(as.character()) command

## data frame with transformation info

### creates list of variable names to turn into factors
variable_to_become_factors <- 
  data.frame(
    variable_to_become_factors = c("table", "cut", "color")
  )

### creates list of data frames for transformation
datasets_to_become_factors <- 
  data.frame(
    datasets_to_become_factors = c("diamonds", "diamonds_bottom300", "diamonds_top300"),
    datasets_post = c("diamonds_post", "diamonds_bottom300_post", "diamonds_top300_post")
  )

### creates dataframe with all possible combinations of data
variable_to_become_factors_in_specific_datasets <- 
  tidyr::crossing(variable_to_become_factors, datasets_to_become_factors)

### splits variable_to_become_factors_in_specific_datasets data frame by data frame name
# ---- NOTE: creates list
variable_to_become_factors_in_specific_datasets_list <- split(variable_to_become_factors_in_specific_datasets, variable_to_become_factors_in_specific_datasets$datasets_to_become_factors)
# ---- NOTE: changes list object name
variable_to_become_factors_in_specific_datasets_list <- 
  setNames(variable_to_become_factors_in_specific_datasets_list, paste("variable_to_become_factors_in_specific_dataset", 
                                                                       datasets_to_become_factors$datasets_to_become_factors,
                                           sep = "__")
  )
# ---- NOTE: creates unique objects for each part list object
list2env(variable_to_become_factors_in_specific_datasets_list, .GlobalEnv)
# ---- NOTE: gathers objects with prefix
apropos("variable_to_become_factors_in_specific_dataset")

## long way to turn data into factors

### individually

#### for diamonds dataset
diamonds_post$table <- as.factor(as.character(diamonds$table))
diamonds_post$cut <- as.factor(as.character(diamonds$cut))
diamonds_post$color <- as.factor(as.character(diamonds$color))

#### for diamonds_bottom300 dataset
diamonds_bottom300_post$table <- as.factor(as.character(diamonds_bottom300$table))
diamonds_bottom300_post$cut <- as.factor(as.character(diamonds_bottom300$cut))
diamonds_bottom300_post$color <- as.factor(as.character(diamonds_bottom300$color))

#### for diamonds_top300 dataset
diamonds_top300_post$table <- as.factor(as.character(diamonds_top300$table))
diamonds_top300_post$cut <- as.factor(as.character(diamonds_top300$cut))
diamonds_top300_post$color <- as.factor(as.character(diamonds_top300$color))

## gives str of datasets
str(diamonds_post)
str(diamonds_top300_post)
str(diamonds_top300_post)

## medium way

### creates function to turn into numeric form
function_turn_dataset_variable_into_factor_form <- 
  # ---- NOTE: turns variable into sum contrasted version of variable
  # ---- NOTE: variable_name ==  variable to be turned to sum contrast
  # ---- NOTE: dataset_name == dataset that contains variable name
  # ---- NOTE: generally speaking, procedure is to create new variable with "_c" as suffix for corresponding sum contrasted variable
  function(variable_name, dataset_name)
  {
    # ---- NOTE: # changes variable_name and dataset_name to object
    colmn1 <- variable_name
    nm1 <- dataset_name
    # ---- NOTE: inserts dataset into function
    dataset_funct_object_A <- 
      data.frame(
        get(nm1)
      )
    # ---- NOTE: transdorms data into factor form
    dataset_funct_object_A[[colmn1]] <- as.factor(as.character(dataset_funct_object_A[[colmn1]]))
    # ---- NOTE: returns appropriate object
    return(dataset_funct_object_A)
  }

### runs user generated function on 1 variable/dataset
# ---- NOTE: gives structure of data
str(diamonds_post$color)
# ---- NOTE: runs function
diamonds_post <- function_turn_dataset_variable_into_factor_form(variable_to_become_factors_in_specific_datasets$variable_to_become_factors[1],variable_to_become_factors_in_specific_datasets$datasets_to_become_factors[1])
# ---- NOTE: gives structure of data
str(diamonds_post$color)
# ---- NOTE: works
# ---- NOTE: not really much faster than the long way

### use mapply on individual lists
# ---- NOTE: applies functions to appropriate variables
function_test_object <- 
  mapply(function_turn_dataset_variable_into_factor_form, 
         variable_to_become_factors_in_specific_datasets$variable_to_become_factors, variable_to_become_factors_in_specific_datasets$datasets_to_become_factors, SIMPLIFY = FALSE)
# ---- NOTE: does not work as desired


编辑1:

评论者“Ronak Shah”的结果:

这似乎不起作用。这可能是因为我自己对 R 的无知。

以下是步骤:

  1. 运行与原始帖子的“这是用于示例的代码:”部分相关的所有代码(未显示)。
  2. 运行评论者的脚本(对我不起作用):
> #Define the columns to change
> cols <- c('table', 'cut',  'color')
> cols
[1] "table" "cut"   "color"
> #Define the names of the dataframe to change
> original_names <- c('diamonds', 'diamonds_bottom300', 'diamonds_top300')
> original_names
[1] "diamonds"           "diamonds_bottom300" "diamonds_top300"   
> #New names of the changed dataframe
> new_names <- paste0(original_names, '_post')
> new_names
[1] "diamonds_post"           "diamonds_bottom300_post" "diamonds_top300_post"   
> #apply function to each column in each dataframe
> lapply(mget(original), function(x) {
+   x[cols] <- lapply(x[cols], function(y) as.factor(as.character(y)))
+   x
+ }) -> result
Error in mget(original) : object 'original' not found
> result
Error: object 'result' not found
> #Write to global environment. 
> names(result) <- new_names
Error in names(result) <- new_names : object 'result' not found
> list2env(result, .GlobalEnv)
Error in list2env(result, .GlobalEnv) : object 'result' not found

经过仔细检查,它可能不起作用,因为其中一个调用被写为“original”,而不是“original_names”。这是此更改的结果:

> #Define the columns to change
> cols <- c('table', 'cut',  'color')
> cols
[1] "table" "cut"   "color"
> #Define the names of the dataframe to change
> original_names <- c('diamonds', 'diamonds_bottom300', 'diamonds_top300')
> original_names
[1] "diamonds"           "diamonds_bottom300" "diamonds_top300"   
> #New names of the changed dataframe
> new_names <- paste0(original_names, '_post')
> new_names
[1] "diamonds_post"           "diamonds_bottom300_post" "diamonds_top300_post"   
> #apply function to each column in each dataframe
> lapply(mget(original_names), function(x) {
+   x[cols] <- lapply(x[cols], function(y) as.factor(as.character(y)))
+   x
+ }) -> result
Error: value for ‘diamonds’ not found
> result
Error: object 'result' not found
> #Write to global environment. 
> names(result) <- new_names
Error in names(result) <- new_names : object 'result' not found
> list2env(result, .GlobalEnv)
Error in list2env(result, .GlobalEnv) : object 'result' not found

不知道该怎么办。任何有关修复的建议都会有所帮助。这可能是我自己的错,我只是没有看到错误。

标签: rlistvariablesdatasetiteration

解决方案


#Define the columns to change
cols <- c('table', 'cut',  'color')
#Define the names of the dataframe to change
original_names <- c('diamonds', 'diamonds_bottom300', 'diamonds_top300')
#New names of the changed dataframe
new_names <- paste0(original_names, '_post')
#apply function to each column in each dataframe
lapply(mget(original), function(x) {
  x[cols] <- lapply(x[cols], function(y) as.factor(as.character(y)))
  x
}) -> result

#Write to global environment. 
names(result) <- new_names
list2env(result, .GlobalEnv)

检查一个数据帧的输出 -

str(diamonds_post)

#tibble [53,940 × 10] (S3: tbl_df/tbl/data.frame)
# $ carat  : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
# $ cut    : Factor w/ 5 levels "Fair","Good",..: 3 4 2 4 2 5 5 5 1 5 ...
# $ color  : Factor w/ 7 levels "D","E","F","G",..: 2 2 2 6 7 7 6 5 2 5 ...
# $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
# $ depth  : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
# $ table  : Factor w/ 127 levels "43","44","49",..: 31 91 116 61 61 51 51 31 91 91 ...
# $ price  : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ...
# $ x      : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
# $ y      : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
# $ z      : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...

推荐阅读