首页 > 解决方案 > R中的(业余)数据操作:变量级别作为新列,其值来自另一列

问题描述

我想创建一个新列,从中提取“b”和“c”type并将其value放入该列中。中间还有其他随机列,应该保留我刚刚命名random的 .


id <- c("1", "1", "1", "1","2", "2", "2", "2", "3", "3", "3", "3")
type <- c("a", "a", "b", "c", "a", "a", "b", "c", "a", "a", "b", "c")
random <- c("random")
value <- c("1", "2", "50", "100", "4", "5", "55", "110", "2.5", "3", "53", "105")

df <- data.frame(id, type, random, value)

  id type random value
1   1    a random     1
2   1    a random     2
3   1    b random    50
4   1    c random   100
5   2    a random     4
6   2    a random     5
7   2    b random    55
8   2    c random   110
9   3    a random   2.5
10  3    a random     3
11  3    b random    53
12  3    c random   105

我想要的是:

  id2 type2 random value2  b   c
1   1     a random      1 50 100
2   1     a random      2 50 100
3   2     a random      4 55 110
4   2     a random      5 55 110
5   3     a random    2.5 53 105
6   3     a random      3 53 105

非常感谢一些想法!最好的,

标签: rdataframedata-manipulation

解决方案


我们可以按'id'分组,通过提取'type'为'b'的'value'创建'b'列(假设每个组的'b'单个值),ungroup并删除'type'所在的行'b'

library(dplyr)
df %>%
    group_by(id) %>%
    mutate(b = value[type == 'b']) %>% 
    ungroup %>% 
    filter(type != 'b')
# A tibble: 4 x 5
#  id    type  keepthis value b    
#  <fct> <fct> <fct>    <fct> <fct>
#1 1     a     keep     4     95   
#2 1     a     keep     5     95   
#3 2     a     keep     3     94   
#4 2     a     keep     5     94  

更新

基于更新后的数据集,我们可以filter使用 'b'、'c' 对 'type' 进行整形,使用 'wide' 格式进行整形pivot_widerleft_join原始数据集filter仅使用 'type' 'a'

library(tidyr)
df %>%
     filter(type %in% c('b', 'c'))  %>% 
     pivot_wider(names_from = type, values_from = value) %>%
     left_join(df %>% 
             filter(type  == 'a'))
# A tibble: 6 x 6
#  id    random b     c     type  value
#* <fct> <fct>  <fct> <fct> <fct> <fct>
#1 1     random 50    100   a     1    
#2 1     random 50    100   a     2    
#3 2     random 55    110   a     4    
#4 2     random 55    110   a     5    
#5 3     random 53    105   a     2.5  
#6 3     random 53    105   a     3    

使用更新的数据集,代码将是

df %>%
    filter(scale_id %in% c('IM', 'RT')) %>% 
    select(Title, Task, task_id, scale_id, data_value) %>% 
    pivot_wider(names_from = scale_id, values_from  = data_value) %>% 
    left_join(df %>% 
                  filter(! scale_id %in% c('IM', 'RT')) %>% 
                  group_by(task_id) %>%
                  slice(which.max(data_value)))

推荐阅读