r - Averaging rows with changing number of columns
问题描述
My data set has 60 columns and several hundred observations. Each observation has a certain length (depending on the length of the video that has been analyzed) and the last few columns might be just NA. I want to be able to average a portion of values for each row. For example, if the video length is 15 seconds, I need to average the first 3 seconds ( a fifth of the row) and if it is 60 seconds, I need the first 12 seconds average.
obs | veideolength | sec1 | sec2 | sec3 | sec4 | sec5 | sec6 | sec7 | ... | sec60 |
---|---|---|---|---|---|---|---|---|---|---|
obs1 | 10 | 15 | 251 | 281 | 249 | 294 | 278 | 249 | ... | na |
obs2 | 5 | 205 | 182 | 164 | 178 | 252 | na | na | ... | na |
obs3 | 55 | 157 | 270 | 277 | 258 | 233 | 242 | 181 | ... | na |
obs4 | 60 | 169 | 194 | 154 | 173 | 237 | 214 | 257 | ... | 187 |
obs5 | 30 | 187 | 159 | 222 | 235 | 275 | 196 | 169 | ... | na |
obs6 | 20 | 198 | 254 | 227 | 247 | 210 | 193 | 289 | ... | na |
obs7 | 60 | 198 | 271 | 225 | 157 | 205 | 192 | 170 | ... | 223 |
obs8 | 25 | 261 | 240 | 263 | 230 | 153 | 267 | 249 | ... | na |
…</td> |
I have tried rowMeans
but the problem is it does not accept a variable inside its arguments.
df$average1 <-rowMeans(df[,3:(3+floor(df$videolength/5))])
I also have tried for loop, but the variable j
does not update and remains the first variable that it has been assigned.
for(i in 1:nrow(df)){
j = (floor(df$VideoLength[i]/5)-1)
frames1$average1 <-rowMeans(df[,3:(3+j)])
}
解决方案
由于您想为每一行取不同列数的平均值,因此您不能rowMeans
在此处直接使用。这是一种apply
按行方式使用的方法。
df$average <- apply(df[-1], 1, function(x) {
mean(x[-1][seq_len(ceiling(x[1]/5))])
})
这是假设您的第二列是veideolength
并且第三列中的所有内容都是第二列。
推荐阅读
- python - Python(flask)——如何在没有“for循环”的情况下从数据中获取值
- javascript - Puppeteer 不关闭浏览器
- bison - yyparse 如何 *repeatedly* 调用 yylex?
- scala - 在数据框中一次选择一组
- c++ - 使用 l+(rl)/2 避免溢出
- javascript - 嵌套数组和访问它们的问题
- android - 在 Android 应用程序中验证令牌是否异常?
- unit-testing - 为期望按键继续的函数编写 Golang 单元测试
- android - 使用自拍杆在程序中做某事
- python - Pandas 滚动适用于 df,其中根据当前行中的值进行过滤