首页 > 解决方案 > 有条件地添加几个新列的更好方法,填充依赖于其他列条目的条目

问题描述

我有以下数据框:

    CustomerID  Department  Price  SportswearDemand  HomeDemand  KidswearDemand  WomenswearDemand
-------------------------------------------------------------------------------------------
    1050091     Sportswear  497.6  0                 0           0               0                          
    1555018     Womenswear  336.0  0                 0           0               0                       
    210239      Womenswear  698.0  0                 0           0               0                       
    507556      Sportswear  209.0  0                 0           0               0                        
    1708193     Sportswear  209.0  0                 0           0               0                        
    1295733     Menswear    209.0  0                 0           0               0                        
    1213373     Sportswear  298.0  0                 0           0               0                       
    753471      Sportswear  209.0  0                 0           0               0                        
    82739       Menswear    349.0  0                 0           0               0                        
    1660995     Kidswear    424.6  0                 0           0               0
      .
      .
      .                

SportswearDemand从现在开始,包括和右侧在内的所有列都称为“需求列”。我想根据以下信息填充这些Department信息Price

如果某个customerID部门包含条目Sportswear,那么我希望将该行的价格输入到SportswearDemand而不是当前的零。其他需求列也是如此。最终结果应如下所示:

   CustomerID  Department  Price  SportswearDemand  HomeDemand  KidswearDemand  WomenswearDemand
-------------------------------------------------------------------------------------------
   1050091    Sportswear   497.6  497.6             0           0               0
   1555018    Womenswear   336.0  0                 0           0               336.0
   210239     Womenswear   698.0  0                 0           0               698.0
   507556     Sportswear   209.0  209.0             0           0               0
   1708193    Sportswear   209.0  209.0             0           0               0
   1295733    Menswear     209.0  0                 0           0               0
   1213373    Sportswear   298.0  298.0             0           0               0
   753471     Sportswear   209.0  209.0             0           0               0
   82739      Menswear     349.0  0                 0           0               0
   1660995    Kidswear     424.6  0                 0           424.6           0
     .
     .
     .

我设法像这样解决它:

df$SportswearDemand <- with(df, ifelse(df$Department == "Sportswear", df$Price, 0))
df$HomeDemand <- with(df, ifelse(df$Department == "Home", df$Price, 0))
df$KidswearDemand <- with(df, ifelse(df$Department == "Kidswear", df$Price, 0))
df$WomenswearDemand <- with(df, ifelse(df$Department == "Womenswear", df$Price, 0))

但是,我还有 30 多个这样的需求列,我想知道是否有比这样硬编码 30 行更好的方法?

我的第一个想法是将一行封装在一个 for 循环中,如下所示:

DemandColumns # array of all the 30 different demand columns stored as strings

for (i in DemandColumns){
 df$i <- with(df, ifelse(df$Department == substr(i,1,nchar(i)-6), df$Price, 0))
}

但它只是添加了一个"i"用零填充的列。substr用于获取除字符串之外的所有字符"Demand"。任何帮助表示赞赏。

标签: r

解决方案


无需初始化“需求列”,先将其删除。

df[grep('Demand', names(df))] <- NULL

Price创建和列的副本Department并获取宽格式数据。

library(dplyr)
library(tidyr)

df %>%
  mutate(value = Price, 
         name = Department) %>%
  pivot_wider(names_from = name, values_from = value, 
              names_glue = '{name}_Demand', values_fill = 0)

#   CustomerID Department Price Sportswear_Demand Womenswear_Demand Menswear_Demand Kidswear_Demand
#        <int> <chr>      <dbl>             <dbl>             <dbl>           <dbl>           <dbl>
# 1    1050091 Sportswear  498.              498.                 0               0              0 
# 2    1555018 Womenswear  336                 0                336               0              0 
# 3     210239 Womenswear  698                 0                698               0              0 
# 4     507556 Sportswear  209               209                  0               0              0 
# 5    1708193 Sportswear  209               209                  0               0              0 
# 6    1295733 Menswear    209                 0                  0             209              0 
# 7    1213373 Sportswear  298               298                  0               0              0 
# 8     753471 Sportswear  209               209                  0               0              0 
# 9      82739 Menswear    349                 0                  0             349              0 
#10    1660995 Kidswear    425.                0                  0               0            425.

推荐阅读