首页 > 解决方案 > 有没有更简单的方法来计算平均值同时删除负值?

问题描述

我要做的是计算三个值的平均值,不包括任何负值。也许有更简单的方法来做到这一点?

#repro eg.
df1 <- structure(list(concentration = c(0, 0.0867, 0.13, 0.195, 0.293, 
                                 0.439, 0.658, 0.988, 1.481, 2.222, 3.333, 5), Replicate = c(1.44558642857143, 
                                                                                             1.15371058441558, 1.02689350649351, 0.868325194805193, 0.677496493506493, 
                                                                                             0.526922597402598, 0.371443376623376, 0.252155129870129, 0.183662272727273, 
                                                                                             0.122282922077922, 0.0892741558441554, 0.0637236363636363), Replicate.1 = c(1.41649441558442, 
                                                                                                                                                                         1.11617954545455, 1.00826512987013, 0.851684350649351, 0.677447077922078, 
                                                                                                                                                                         0.523192987012987, 0.368280584415585, 0.262413311688312, 0.175215584415585, 
                                                                                                                                                                         0.129054415584416, 0.092797987012987, 0.0627326623376624), Replicate.2 = c(1.35938512987013, 
                                                                                                                                                                                                                                                    1.21117383116883, 1.01522181818182, 0.891895324675324, 0.695687207792208, 
                                                                                                                                                                                                                                                    0.518078831168831, 0.361077272727272, 0.25113487012987, 0.167685064935065, 
                                                                                                                                                                                                                                                    0.121838701298701, 0.0813138961038961, 0.0731186363636365)), class = c("rowwise_df", 
                                                                                                                                                                                                                                                                                                                           "tbl_df", "tbl", "data.frame"), .Names = c("concentration", "Replicate", 
                                                                                                                                                                                                                                                                                                                                                                      "Replicate.1", "Replicate.2"), row.names = c(NA, 12L))
docv <- function(df1){
  df1 %>% rename(Replicate.1=Replicate,Replicate.2=Replicate.1,Replicate.3=Replicate.2) %>% 
    mutate(tnegcount=sum(c(Replicate.1<0,Replicate.2<0,Replicate.3<0))) %>%
    mutate(averagev=case_when(tnegcount==0 ~ mean(c(Replicate.1,Replicate.2,Replicate.3)),
                              tnegcount>0 ~ c(Replicate.1,Replicate.2,Replicate.3)[c(Replicate.1,Replicate.2,Replicate.3)>0] %>% mean()
    )) %>% return()
}

docv(df1)

标签: rdplyr

解决方案


使用基础 R,您可以执行以下操作:

df1 = structure(list(concentration = c(0, 0.0867, 0.13, 0.195, 0.293, 0.439, 0.658, 0.988, 1.481, 2.222, 3.333, 5), 
                 Replicate = c(-0.4689826737158, -0.25575220072642, 0.145706726703793, 0.816415579989552, -0.596636137925088, 0.796779369935393, 0.889350537210703, 0.321595584973693, 0.258228087797761, -0.876427459064871, -0.588050850201398, -0.646886494942009), 
                 Replicate.1 = c(0.374045693315566, -0.231792563572526, 0.539682839997113, -0.00460151582956314, 0.435237016528845, 0.983812189660966, -0.239929641131312, 0.554890442639589, 0.869410462211818, -0.575714957434684, 0.303347532171756, -0.748889808077365), 
                 Replicate.2 = c(-0.465558662544936, -0.227771814912558, -0.973219333682209, -0.235224085859954, 0.73938169144094, -0.319302006624639, -0.0358397690579295, 0.199131650850177, -0.0129173859022558, -0.627564797177911, 0.654746637213975, 0.336933476384729)),
            .Names = c("concentration", "Replicate", "Replicate.1", "Replicate.2"), 
            row.names = c(NA, 12L), 
            class = c("rowwise_df", "tbl_df", "tbl", "data.frame"))


df1$averageV = apply(df1[,2:4], 1, function(x){mean(x[x>0])})

这给出了以下结果:

   concentration  Replicate  Replicate.1 Replicate.2  averageV
1         0.0000 -0.4689827  0.374045693 -0.46555866 0.3740457
2         0.0867 -0.2557522 -0.231792564 -0.22777181       NaN
3         0.1300  0.1457067  0.539682840 -0.97321933 0.3426948
4         0.1950  0.8164156 -0.004601516 -0.23522409 0.8164156
5         0.2930 -0.5966361  0.435237017  0.73938169 0.5873094
6         0.4390  0.7967794  0.983812190 -0.31930201 0.8902958
7         0.6580  0.8893505 -0.239929641 -0.03583977 0.8893505
8         0.9880  0.3215956  0.554890443  0.19913165 0.3585392
9         1.4810  0.2582281  0.869410462 -0.01291739 0.5638193
10        2.2220 -0.8764275 -0.575714957 -0.62756480       NaN
11        3.3330 -0.5880509  0.303347532  0.65474664 0.4790471
12        5.0000 -0.6468865 -0.748889808  0.33693348 0.3369335

推荐阅读