首页 > 解决方案 > r 根据条件生成具有值的列

问题描述

我有一个包含 2 列的数据集,如下所示

   w     p
   0.5   0.5267
   0.5   0.5239
   1.0   0.5267
   1.0   0.5267
   1.0   0.5267
   0.5   0.3870
   0.5   0.3566
   1.0   0.4914
   1.0   0.4914  
   0.125 0.5267 
   0.125 0.5239 
   0.125 0.3870 
   0.125 0.3844 
   0.125 0.4942 
   0.125 0.4914 
   0.125 0.3566 
   0.125 0.3540 

我正在尝试根据以下标准创建第三列

Step1 : Start with Row 1 and check the value in Column w. 
        Row 1 column w is not 1  
Step2 : if the value in column w is not 1, then read the next value in column w. 
        Read the next column w value (Row 2)
Step3 : repeat step 2 until the sum of values from column w is 1.
        Column w row1 and row2 , 0.5 + 0.5 = 1
Step4 : Then read the corresponding values in column p.
        0.5267,  0.5239
Step5 : Multiply the values in column p with corresponding values in column w.
        0.5267*0.5 , 0.5239*0.5
Step6 : Add the values from Step 5
        0.5267*0.5 +  0.5239*0.5 
Step6 : Divide the values in column p with sum from step5.
        0.5267/(0.5267*0.5 +  0.5239*0.5) 
        0.5239/(0.5267*0.5 +  0.5239*0.5) 

预期输出如下

   w     p        Result
   0.5   0.5267   0.5267/(0.5267*0.5 +  0.5239*0.5) 
   0.5   0.5239   0.5239/(0.5267*0.5 +  0.5239*0.5)
   1.0   0.5267   1
   1.0   0.5267   1
   1.0   0.5267   1 
   0.5   0.3870   0.3870/(0.3870*0.5 +  0.3566*0.5)
   0.5   0.3566   0.3566/(0.3870*0.5 +  0.3566*0.5)
   1.0   0.4914   1
   1.0   0.4914   1
   0.125 0.5267   0.5267/(0.5267*0.125 + 0.5239*0.125 + 0.3870*0.125 + 0.3844*0.125 + 0.4942*0.125 + 0.4914*0.125 + 0.3566*0.125 + 0.3540*0.125)
   0.125 0.5239   0.5239/(0.5267*0.125 + 0.5239*0.125 + 0.3870*0.125 + 0.3844*0.125 + 0.4942*0.125 + 0.4914*0.125 + 0.3566*0.125 + 0.3540*0.125)
   0.125 0.3870   0.3870/(0.5267*0.125 + 0.5239*0.125 + 0.3870*0.125 + 0.3844*0.125 + 0.4942*0.125 + 0.4914*0.125 + 0.3566*0.125 + 0.3540*0.125)
   0.125 0.3844   0.3844/(0.5267*0.125 + 0.5239*0.125 + 0.3870*0.125 + 0.3844*0.125 + 0.4942*0.125 + 0.4914*0.125 + 0.3566*0.125 + 0.3540*0.125)
   0.125 0.4942   0.4942/(0.5267*0.125 + 0.5239*0.125 + 0.3870*0.125 + 0.3844*0.125 + 0.4942*0.125 + 0.4914*0.125 + 0.3566*0.125 + 0.3540*0.125)
   0.125 0.4914   0.4914/(0.5267*0.125 + 0.5239*0.125 + 0.3870*0.125 + 0.3844*0.125 + 0.4942*0.125 + 0.4914*0.125 + 0.3566*0.125 + 0.3540*0.125)
   0.125 0.3566   0.3566/(0.5267*0.125 + 0.5239*0.125 + 0.3870*0.125 + 0.3844*0.125 + 0.4942*0.125 + 0.4914*0.125 + 0.3566*0.125 + 0.3540*0.125)
   0.125 0.3540   0.3540/(0.5267*0.125 + 0.5239*0.125 + 0.3870*0.125 + 0.3844*0.125 + 0.4942*0.125 + 0.4914*0.125 + 0.3566*0.125 + 0.3540*0.125)

我可以使用 for 循环和 ifelse 语句来做到这一点,想知道是否有更优雅的方式来实现这一点。谢谢。

标签: r

解决方案


我们可以使用值的累积和创建一个组w并计算result.

library(dplyr)

df %>%
  group_by(gr = ceiling(cumsum(w))) %>%
  mutate(result = p/sum(w * p)) %>%
  ungroup() %>%
  select(-gr)


# A tibble: 17 x 3
#       w     p result
#   <dbl> <dbl>  <dbl>
# 1 0.5   0.527  1.00 
# 2 0.5   0.524  0.997
# 3 1     0.527  1    
# 4 1     0.527  1    
# 5 1     0.527  1    
# 6 0.5   0.387  1.04 
# 7 0.5   0.357  0.959
# 8 1     0.491  1    
# 9 1     0.491  1    
#10 0.125 0.527  1.20 
#11 0.125 0.524  1.19 
#12 0.125 0.387  0.880
#13 0.125 0.384  0.874
#14 0.125 0.494  1.12 
#15 0.125 0.491  1.12 
#16 0.125 0.357  0.811
#17 0.125 0.354  0.805

这可以通过以下方式完成data.table

library(data.table)
setDT(df)[, result := p/sum(w * p), ceiling(cumsum(w))]

推荐阅读