r - 使用列表和用户生成的在多个变量和数据集上运行特定命令
问题描述
我想使用列表和用户生成的在多个变量和数据集上运行特定命令。
例如,我想在 3 个不同的数据集 、 和 上使用 R 中的命令将、和变量table
转换cut
为color
因子,并将结果放入 3 个新的和用户指定的数据集,称为、和。as.factor(as.character())
diamonds
diamonds_bottom300
diamonds_top300
diamonds_post
diamonds_bottom300_post
diamonds_top300_post
我可以做到这一点很长的路要走:
## long way to turn data into factors
### individually
#### for diamonds dataset
diamonds_post$table <- as.factor(as.character(diamonds$table))
diamonds_post$cut <- as.factor(as.character(diamonds$cut))
diamonds_post$color <- as.factor(as.character(diamonds$color))
#### for diamonds_bottom300 dataset
diamonds_bottom300_post$table <- as.factor(as.character(diamonds_bottom300$table))
diamonds_bottom300_post$cut <- as.factor(as.character(diamonds_bottom300$cut))
diamonds_bottom300_post$color <- as.factor(as.character(diamonds_bottom300$color))
#### for diamonds_top300 dataset
diamonds_top300_post$table <- as.factor(as.character(diamonds_top300$table))
diamonds_top300_post$cut <- as.factor(as.character(diamonds_top300$cut))
diamonds_top300_post$color <- as.factor(as.character(diamonds_top300$color))
## gives str of datasets
str(diamonds_post)
str(diamonds_top300_post)
str(diamonds_top300_post)
> ## gives str of datasets
> str(diamonds_post)
'data.frame': 53940 obs. of 10 variables:
$ carat : num 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
$ cut : Factor w/ 5 levels "Fair","Good",..: 3 4 2 4 2 5 5 5 1 5 ...
$ color : Factor w/ 7 levels "D","E","F","G",..: 2 2 2 6 7 7 6 5 2 5 ...
$ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
$ depth : num 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
$ table : Factor w/ 127 levels "43","44","49",..: 31 91 116 61 61 51 51 31 91 91 ...
$ price : int 326 326 327 334 335 336 336 337 337 338 ...
$ x : num 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
$ y : num 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
$ z : num 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
> str(diamonds_top300_post)
'data.frame': 327 obs. of 10 variables:
$ carat : num 0.23 0.86 0.84 0.7 0.76 0.57 0.74 0.91 0.98 0.71 ...
$ cut : Factor w/ 3 levels "Fair","Good",..: 2 1 1 1 1 1 1 1 1 1 ...
$ color : Factor w/ 7 levels "D","E","F","G",..: 2 2 4 4 4 2 3 5 2 1 ...
$ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 5 2 3 7 5 7 4 2 2 4 ...
$ depth : num 56.9 55.1 55.1 58.8 59 58.7 61.1 61.3 53.3 56.9 ...
$ table : Factor w/ 12 levels "65","65.4","66",..: 1 6 4 3 7 3 5 4 4 1 ...
$ price : int 327 2757 2782 2797 2800 2805 2805 2825 2855 2858 ...
$ x : num 4.05 6.45 6.39 5.81 5.89 5.34 5.82 6.24 6.82 5.89 ...
$ y : num 4.07 6.33 6.2 5.9 5.8 5.43 5.75 6.19 6.74 5.84 ...
$ z : num 2.31 3.52 3.47 3.44 3.46 3.16 3.53 3.81 3.61 3.34 ...
> str(diamonds_top300_post)
'data.frame': 327 obs. of 10 variables:
$ carat : num 0.23 0.86 0.84 0.7 0.76 0.57 0.74 0.91 0.98 0.71 ...
$ cut : Factor w/ 3 levels "Fair","Good",..: 2 1 1 1 1 1 1 1 1 1 ...
$ color : Factor w/ 7 levels "D","E","F","G",..: 2 2 4 4 4 2 3 5 2 1 ...
$ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 5 2 3 7 5 7 4 2 2 4 ...
$ depth : num 56.9 55.1 55.1 58.8 59 58.7 61.1 61.3 53.3 56.9 ...
$ table : Factor w/ 12 levels "65","65.4","66",..: 1 6 4 3 7 3 5 4 4 1 ...
$ price : int 327 2757 2782 2797 2800 2805 2805 2825 2855 2858 ...
$ x : num 4.05 6.45 6.39 5.81 5.89 5.34 5.82 6.24 6.82 5.89 ...
$ y : num 4.07 6.33 6.2 5.9 5.8 5.43 5.75 6.19 6.74 5.84 ...
$ z : num 2.31 3.52 3.47 3.44 3.46 3.16 3.53 3.81 3.61 3.34 ...
我尝试创建一个用户生成的函数来执行此任务,以及一个相应的列表:
### creates function to turn into numeric form
function_turn_dataset_variable_into_factor_form <-
# ---- NOTE: turns variable into sum contrasted version of variable
# ---- NOTE: variable_name == variable to be turned to sum contrast
# ---- NOTE: dataset_name == dataset that contains variable name
# ---- NOTE: generally speaking, procedure is to create new variable with "_c" as suffix for corresponding sum contrasted variable
function(variable_name, dataset_name)
{
# ---- NOTE: # changes variable_name and dataset_name to object
colmn1 <- variable_name
nm1 <- dataset_name
# ---- NOTE: inserts dataset into function
dataset_funct_object_A <-
data.frame(
get(nm1)
)
# ---- NOTE: transdorms data into factor form
dataset_funct_object_A[[colmn1]] <- as.factor(as.character(dataset_funct_object_A[[colmn1]]))
# ---- NOTE: returns appropriate object
return(dataset_funct_object_A)
}
# ---- NOTE: dataset with lists of corresponding variables/dfs
variable_to_become_factors_in_specific_datasets
# A tibble: 9 x 3
variable_to_become_factors datasets_to_become_factors datasets_post
<chr> <chr> <chr>
1 color diamonds diamonds_post
2 color diamonds_bottom300 diamonds_bottom300_post
3 color diamonds_top300 diamonds_top300_post
4 cut diamonds diamonds_post
5 cut diamonds_bottom300 diamonds_bottom300_post
6 cut diamonds_top300 diamonds_top300_post
7 table diamonds diamonds_post
8 table diamonds_bottom300 diamonds_bottom300_post
9 table diamonds_top300 diamonds_top300_post
当我单独使用它时它确实有效,尽管使用该功能并没有比我使用长途时更快。
### runs user generated function on 1 variable/dataset
# ---- NOTE: gives structure of data
str(diamonds_post$color)
# ---- NOTE: runs function
diamonds_post <- function_turn_dataset_variable_into_factor_form(variable_to_become_factors_in_specific_datasets$variable_to_become_factors[1],variable_to_become_factors_in_specific_datasets$datasets_to_become_factors[1])
# ---- NOTE: gives structure of data
str(diamonds_post$color)
# ---- NOTE: works
# ---- NOTE: not really much faster than the long way
当我使用 mapply() 将它应用到列表时,我无法真正让它以我想要的方式工作。有没有办法让这个任务使用用户生成的函数来工作,该函数将转换后的变量返回到与起始数据集不同的对应用户特定数据集?
提前感谢您的帮助。
这是用于示例的代码:
# Loads packages
# ---- NOTE: making plots and diamonds dataset
if(!require(ggplot2)){install.packages("ggplot2")}
# ---- NOTE: run mixed effects models
if(!require(lme4)){install.packages("lme4")}
# ---- NOTE: for data wrangling
if(!require(dplyr)){install.packages("dplyr")}
# dataset creation
## for dataset with top 300 rows
# ---- NOTE: selects only the top 300 rows of the dataset
diamonds_top300 <- data.frame(dplyr::top_n(diamonds, 300, table))
# ---- NOTE: gives dataset info
head(diamonds_top300)
str(diamonds_top300)
colnames(diamonds_top300)
nrow(diamonds_top300)
# ---- NOTE: gives unique values of Fixed and Random effects, and dvs
unique(diamonds_top300$price)
unique(diamonds_top300$y)
unique(diamonds_top300$cut)
unique(diamonds_top300$color)
unique(diamonds_top300$carat)
unique(diamonds_top300$clarity)
unique(diamonds_top300$depth)
unique(diamonds_top300$table)
## for dataset with bottom 300 rows
### dataset
# ---- NOTE: selects only the bottom 300 rows of the dataset
diamonds_bottom300 <- data.frame(dplyr::top_n(diamonds, -300, table))
# ---- NOTE: gives dataset info
head(diamonds_bottom300)
str(diamonds_bottom300)
colnames(diamonds_bottom300)
nrow(diamonds_bottom300)
# ---- NOTE: gives unique values of Fixed and Random effects, and dvs
unique(diamonds_bottom300$price)
unique(diamonds_bottom300$y)
unique(diamonds_bottom300$cut)
unique(diamonds_bottom300$color)
unique(diamonds_bottom300$carat)
unique(diamonds_bottom300$clarity)
unique(diamonds_bottom300$depth)
unique(diamonds_bottom300$table)
### creates end result variables
diamonds_post <- data.frame(diamonds_bottom300)
diamonds_top300_post <- data.frame(diamonds_top300)
diamonds_bottom300_post <- data.frame(diamonds_bottom300)
# turns variables into factor for using as.factor(as.character()) command
## data frame with transformation info
### creates list of variable names to turn into factors
variable_to_become_factors <-
data.frame(
variable_to_become_factors = c("table", "cut", "color")
)
### creates list of data frames for transformation
datasets_to_become_factors <-
data.frame(
datasets_to_become_factors = c("diamonds", "diamonds_bottom300", "diamonds_top300"),
datasets_post = c("diamonds_post", "diamonds_bottom300_post", "diamonds_top300_post")
)
### creates dataframe with all possible combinations of data
variable_to_become_factors_in_specific_datasets <-
tidyr::crossing(variable_to_become_factors, datasets_to_become_factors)
### splits variable_to_become_factors_in_specific_datasets data frame by data frame name
# ---- NOTE: creates list
variable_to_become_factors_in_specific_datasets_list <- split(variable_to_become_factors_in_specific_datasets, variable_to_become_factors_in_specific_datasets$datasets_to_become_factors)
# ---- NOTE: changes list object name
variable_to_become_factors_in_specific_datasets_list <-
setNames(variable_to_become_factors_in_specific_datasets_list, paste("variable_to_become_factors_in_specific_dataset",
datasets_to_become_factors$datasets_to_become_factors,
sep = "__")
)
# ---- NOTE: creates unique objects for each part list object
list2env(variable_to_become_factors_in_specific_datasets_list, .GlobalEnv)
# ---- NOTE: gathers objects with prefix
apropos("variable_to_become_factors_in_specific_dataset")
## long way to turn data into factors
### individually
#### for diamonds dataset
diamonds_post$table <- as.factor(as.character(diamonds$table))
diamonds_post$cut <- as.factor(as.character(diamonds$cut))
diamonds_post$color <- as.factor(as.character(diamonds$color))
#### for diamonds_bottom300 dataset
diamonds_bottom300_post$table <- as.factor(as.character(diamonds_bottom300$table))
diamonds_bottom300_post$cut <- as.factor(as.character(diamonds_bottom300$cut))
diamonds_bottom300_post$color <- as.factor(as.character(diamonds_bottom300$color))
#### for diamonds_top300 dataset
diamonds_top300_post$table <- as.factor(as.character(diamonds_top300$table))
diamonds_top300_post$cut <- as.factor(as.character(diamonds_top300$cut))
diamonds_top300_post$color <- as.factor(as.character(diamonds_top300$color))
## gives str of datasets
str(diamonds_post)
str(diamonds_top300_post)
str(diamonds_top300_post)
## medium way
### creates function to turn into numeric form
function_turn_dataset_variable_into_factor_form <-
# ---- NOTE: turns variable into sum contrasted version of variable
# ---- NOTE: variable_name == variable to be turned to sum contrast
# ---- NOTE: dataset_name == dataset that contains variable name
# ---- NOTE: generally speaking, procedure is to create new variable with "_c" as suffix for corresponding sum contrasted variable
function(variable_name, dataset_name)
{
# ---- NOTE: # changes variable_name and dataset_name to object
colmn1 <- variable_name
nm1 <- dataset_name
# ---- NOTE: inserts dataset into function
dataset_funct_object_A <-
data.frame(
get(nm1)
)
# ---- NOTE: transdorms data into factor form
dataset_funct_object_A[[colmn1]] <- as.factor(as.character(dataset_funct_object_A[[colmn1]]))
# ---- NOTE: returns appropriate object
return(dataset_funct_object_A)
}
### runs user generated function on 1 variable/dataset
# ---- NOTE: gives structure of data
str(diamonds_post$color)
# ---- NOTE: runs function
diamonds_post <- function_turn_dataset_variable_into_factor_form(variable_to_become_factors_in_specific_datasets$variable_to_become_factors[1],variable_to_become_factors_in_specific_datasets$datasets_to_become_factors[1])
# ---- NOTE: gives structure of data
str(diamonds_post$color)
# ---- NOTE: works
# ---- NOTE: not really much faster than the long way
### use mapply on individual lists
# ---- NOTE: applies functions to appropriate variables
function_test_object <-
mapply(function_turn_dataset_variable_into_factor_form,
variable_to_become_factors_in_specific_datasets$variable_to_become_factors, variable_to_become_factors_in_specific_datasets$datasets_to_become_factors, SIMPLIFY = FALSE)
# ---- NOTE: does not work as desired
编辑1:
评论者“Ronak Shah”的结果:
这似乎不起作用。这可能是因为我自己对 R 的无知。
以下是步骤:
- 运行与原始帖子的“这是用于示例的代码:”部分相关的所有代码(未显示)。
- 运行评论者的脚本(对我不起作用):
> #Define the columns to change
> cols <- c('table', 'cut', 'color')
> cols
[1] "table" "cut" "color"
> #Define the names of the dataframe to change
> original_names <- c('diamonds', 'diamonds_bottom300', 'diamonds_top300')
> original_names
[1] "diamonds" "diamonds_bottom300" "diamonds_top300"
> #New names of the changed dataframe
> new_names <- paste0(original_names, '_post')
> new_names
[1] "diamonds_post" "diamonds_bottom300_post" "diamonds_top300_post"
> #apply function to each column in each dataframe
> lapply(mget(original), function(x) {
+ x[cols] <- lapply(x[cols], function(y) as.factor(as.character(y)))
+ x
+ }) -> result
Error in mget(original) : object 'original' not found
> result
Error: object 'result' not found
> #Write to global environment.
> names(result) <- new_names
Error in names(result) <- new_names : object 'result' not found
> list2env(result, .GlobalEnv)
Error in list2env(result, .GlobalEnv) : object 'result' not found
经过仔细检查,它可能不起作用,因为其中一个调用被写为“original”,而不是“original_names”。这是此更改的结果:
> #Define the columns to change
> cols <- c('table', 'cut', 'color')
> cols
[1] "table" "cut" "color"
> #Define the names of the dataframe to change
> original_names <- c('diamonds', 'diamonds_bottom300', 'diamonds_top300')
> original_names
[1] "diamonds" "diamonds_bottom300" "diamonds_top300"
> #New names of the changed dataframe
> new_names <- paste0(original_names, '_post')
> new_names
[1] "diamonds_post" "diamonds_bottom300_post" "diamonds_top300_post"
> #apply function to each column in each dataframe
> lapply(mget(original_names), function(x) {
+ x[cols] <- lapply(x[cols], function(y) as.factor(as.character(y)))
+ x
+ }) -> result
Error: value for ‘diamonds’ not found
> result
Error: object 'result' not found
> #Write to global environment.
> names(result) <- new_names
Error in names(result) <- new_names : object 'result' not found
> list2env(result, .GlobalEnv)
Error in list2env(result, .GlobalEnv) : object 'result' not found
不知道该怎么办。任何有关修复的建议都会有所帮助。这可能是我自己的错,我只是没有看到错误。
解决方案
#Define the columns to change
cols <- c('table', 'cut', 'color')
#Define the names of the dataframe to change
original_names <- c('diamonds', 'diamonds_bottom300', 'diamonds_top300')
#New names of the changed dataframe
new_names <- paste0(original_names, '_post')
#apply function to each column in each dataframe
lapply(mget(original), function(x) {
x[cols] <- lapply(x[cols], function(y) as.factor(as.character(y)))
x
}) -> result
#Write to global environment.
names(result) <- new_names
list2env(result, .GlobalEnv)
检查一个数据帧的输出 -
str(diamonds_post)
#tibble [53,940 × 10] (S3: tbl_df/tbl/data.frame)
# $ carat : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
# $ cut : Factor w/ 5 levels "Fair","Good",..: 3 4 2 4 2 5 5 5 1 5 ...
# $ color : Factor w/ 7 levels "D","E","F","G",..: 2 2 2 6 7 7 6 5 2 5 ...
# $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
# $ depth : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
# $ table : Factor w/ 127 levels "43","44","49",..: 31 91 116 61 61 51 51 31 91 91 ...
# $ price : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ...
# $ x : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
# $ y : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
# $ z : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
推荐阅读
- unity3d - Unity2D让我在创建材质时编辑三件事
- javascript - 在 Google Maps API Javascript 中隐藏标记
- c# - c#:在字典上迭代时避免分支
- java - android.os.NetworkOnMainThreadException - 如何解决?
- youtube - Youtube Data API - 搜索应该匹配所有关键字
- html - 在 Semantic UI React 中将项对齐到另一个下方
- node.js - 从子集合的父文档中获取字段
- vba - 计算两个选定单元格的偏移量
- c# - Xamarin Forms 在编译时显示 IValueConverter 错误
- angular - 如何在 Angular 2 及更高版本内部发生双向数据绑定