首页 > 解决方案 > 如何使用 R 中的代码数据帧的信息自动重新编码

问题描述

我将变量v1v2数据帧重新编码dfdf2. 但是我有几个变量要重新编码,如果我将重新编码信息放入另一个数据帧 ( df3) 并使用某种循环使用 info from 重新编码 df会很容易df3。我尝试了几种解决方案都没有成功。

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df <- data.frame(
  v1 = c(1, 1, 2, 3),
  v2 = c(2, 3, 1, 1)
)
df <- df %>%
  mutate(v1 = factor(v1, levels = c(1,2,3), labels = c("red", "blue", "green"))) %>%
  mutate(v2 = factor(v2, levels = c(1,2,3), labels = c("white", "pale", "black")))


# recoding df using levels
df2 <- df %>%
  mutate(v1=case_when(v1 %in% levels(v1)[1:2] ~ "Bad",
                               v1 %in% levels(v1)[3] ~ "Good")) %>%
  mutate(v2=case_when(v2 %in% levels(v2)[1] ~ "Low",
                               v2 %in% levels(v2)[2:3] ~ "High"))

# df3 contains transformation codes for df
# I want to use this info to automate recoding of df
df3 <- data.frame(
  vs = c("v1", "v1", "v2", "v2"), # variables
  ls = c("1:2", "3", "1", "2:3"), # levels
  lb = c("Bad", "Good", "Low", "High") # new labels
)

reprex 包于 2020-10-02 创建(v0.3.0)

标签: rdplyr

解决方案


您可以扩展ls列中的序列df3并为每个数字创建单独的行。

library(dplyr)
library(tidyr)

df4 <- df3 %>%  
  rowwise() %>%
  mutate(new_ls = list(eval(parse(text = ls)))) %>%
  unnest(new_ls) %>%
  select(-ls)
df4

#   vs    lb    new_ls
#  <chr> <chr>  <dbl>
#1 v1    Bad        1
#2 v1    Bad        2
#3 v1    Good       3
#4 v2    Low        1
#5 v2    High       2
#6 v2    High       3

获取df长格式并加入df4并以宽格式获取数据

df %>%
  pivot_longer(cols = everything(), names_to = 'vs', values_to = 'new_ls') %>%
  left_join(df4, by = c('vs', 'new_ls')) %>%
  group_by(vs) %>%
  mutate(new_ls = row_number()) %>%
  pivot_wider(names_from = vs, values_from = lb) %>%
  select(-new_ls)

#   v1    v2   
#  <chr> <chr>
#1 Bad   High 
#2 Bad   High 
#3 Bad   Low  
#4 Good  Low  

推荐阅读