首页 > 解决方案 > 根据我的数据的单个段的计数更新数据框中的单个列

问题描述

1.我想用“misc”更新单列的数据框,其中列名不在“marketing”中 | 'OLD_Data' | '企业主' | '团队沟通'| '销售' | 'YouTube' | '内部团队' |'游戏' | '视频' | “教育”|“营销机构”。对于下面使用的代码:

data4$individual_segment<-apply(data4$individual_segment,1,function(x){ if (!data4$individual_segment == 'marketing' | 'OLD_Data' | 'business owner' | 'team communication'
              | 'sales' | 'YOUTUBE' | 'internal team' |'GAMING' | 'VIDEOS' | 'EDUCATION' |'marketing agency')  'misc'})

    data4$segment<-ifelse(data4$individual_segment=='marketing' | 'OLD_Data' | 'business owner' | 'team communication'
| 'sales' | 'YOUTUBE' | 'internal team' |'GAMING' | 'VIDEOS' | 'EDUCATION' |'marketing agency',1,0)

2.我也想根据计数进行更新:表(data4$individual_segment)给了我以下信息: 在此处输入图像描述

因此,如果 freq<=9 则 data4$individual_segment 中的每一行都必须替换为“misc”

在此处输入图像描述

标签: r

解决方案


以下应该有效:

library(dplyr)
library(stringr)

pattern <- 'marketing|OLD_Data|business owner|team communication|sales|YOUTUBE|internal team|GAMING|VIDEOS|EDUCATION|marketing agency'

data4 %>% 
  group_by(individual_segment) %>% 
  mutate(count=n()) %>% 
  ungroup() %>% 
  mutate(segment=ifelse(str_detect(individual_segment, pattern) | count<=9, 
                        'misc', individual_segment))

或者,如果您不是在寻找包含individual_segment但完全匹配的字符串,您可以执行以下操作:

misc.vect <- c('marketing', 'OLD_Data', 'business owner', 'team communication', 
               'sales', 'YOUTUBE', 'internal team', 'GAMING', 'VIDEOS', 
               'EDUCATION', 'marketing agency')

data4 %>% 
  group_by(individual_segment) %>% 
  mutate(count=n()) %>% 
  ungroup() %>% 
  mutate(segment=ifelse(individual_segment %in% misc.vect | count<=9, 
                        'misc', individual_segment))

推荐阅读