首页 > 解决方案 > tapply 的问题

问题描述

我正在使用 tapply 按样本 ID(SID)组合表格。对于列表中的第一个样本,有 3 个测量值,但它仅显示为一个。

我有 4 件事需要传递给新表。首先是 SID。其次是具有该 SID 的所有测量值的面积平均值。第三是所有的距离。最后是测量次数。

cases_iTLS <- data.frame(unique(iTLS$SID))
colnames(cases_iTLS)[colnames(cases_iTLS)=="unique.iTLS.SID."] <- "SID"
cases_iTLS$SID <- factor(cases_iTLS$SID)

# Average of TLS on one slide for area
cases_iTLS$Area_iTLS <- tapply(iTLS$Area, iTLS$SID,FUN=mean) 

# Average of TLS on one slide for distance
cases_iTLS$Distance_iTLS <- tapply(iTLS$Distance, iTLS$SID,FUN=mean) 

# Number of measurements per SID
cases_iTLS$Count_iTLS <- tapply(iTLS$Region_Index, iTLS$SID,FUN=length) 


SID       Region_index   Area         Distance    Type    Location
112906    1              53531.53     71.982      iTLS    intratumoral
112906    3              76809.61     97.384      iTLS    intratumoral
112906    5              40937.30     9.643       iTLS    intratumoral
112947    1              35071.66     2.067       iTLS    intratumoral
112947    3              17979.88     36.319      iTLS

标签: r

解决方案


因为您需要跨多个列(AreaDistanceSIDmean )运行单独的聚合函数 ( and ) ,所以请考虑使用for grouping aggregation 来返回数据框。lengthaggregate

通常,tapply在单个数字指标上运行,而不是跨列或函数返回单个命名的原子向量。下面调用一个do.call+data.frame来绑定多个聚合的嵌套结果

aggregate

# AGGREGATE ACROSS COLS AND FUNCS
cases_iTLS <- aggregate(cbind(Area, Distance, Region_Index) ~ SID, iTLS, 
                        function(x) c(mean=mean(x), count = length(x))

# BIND NESTED, UNDERLYING RESULTS
cases_iTLS <- do.call(data.frame, cases_iTLS)

# KEEP NEEDED COLUMNS
cases_iTL <- cases_iTL[c("SID", "Area.mean", "Distance.mean", "Region_Index.count")

tapply

如果您想走这条路,请考虑使用和 transposetapply构建单独的聚合矩阵:rbindt

cases_iTL_mat <- with(iTLS, 
                         t(rbind(Area_mean = tapply(Area, SID, FUN=mean) ,
                                 Distance_mean = tapply(Distance, SID, FUN=mean),
                                 Region_count = tapply(Region_Index, SID, FUN=length)
                          ))
                 )

by

而且我会疏忽不指出by(面向对象的包装器tapply):

cases_iTL_mat <- do.call(rbind, 
        by(iTLS, iTLS$SID, function(sub) {
               c(Area_mean = mean(sub$Area),
                 Distance_mean = mean(sub$Distance),
                 Region_count = length(sub$Region_Index))
          })
)

推荐阅读