首页 > 解决方案 > Averaging rows with changing number of columns

问题描述

My data set has 60 columns and several hundred observations. Each observation has a certain length (depending on the length of the video that has been analyzed) and the last few columns might be just NA. I want to be able to average a portion of values for each row. For example, if the video length is 15 seconds, I need to average the first 3 seconds ( a fifth of the row) and if it is 60 seconds, I need the first 12 seconds average.

obs veideolength sec1 sec2 sec3 sec4 sec5 sec6 sec7 ... sec60
obs1 10 15 251 281 249 294 278 249 ... na
obs2 5 205 182 164 178 252 na na ... na
obs3 55 157 270 277 258 233 242 181 ... na
obs4 60 169 194 154 173 237 214 257 ... 187
obs5 30 187 159 222 235 275 196 169 ... na
obs6 20 198 254 227 247 210 193 289 ... na
obs7 60 198 271 225 157 205 192 170 ... 223
obs8 25 261 240 263 230 153 267 249 ... na
…</td>

I have tried rowMeans but the problem is it does not accept a variable inside its arguments.

df$average1 <-rowMeans(df[,3:(3+floor(df$videolength/5))])

I also have tried for loop, but the variable j does not update and remains the first variable that it has been assigned.

for(i in 1:nrow(df)){
  j = (floor(df$VideoLength[i]/5)-1)
  frames1$average1 <-rowMeans(df[,3:(3+j)])
}

标签: rmean

解决方案


由于您想为每一行取不同列数的平均值,因此您不能rowMeans在此处直接使用。这是一种apply按行方式使用的方法。

df$average <- apply(df[-1], 1, function(x) {
  mean(x[-1][seq_len(ceiling(x[1]/5))])
})

这是假设您的第二列是veideolength并且第三列中的所有内容都是第二列。


推荐阅读