首页 > 解决方案 > 将函数应用于利用列手段的数据框(~帕累托缩放)

问题描述

在数据框中,我想将每个值除以列值的标准偏差的平方根(~ Pareto Scaling)。我从现有包中获取了代码(https://github.com/cran/RFmarkerDetector/blob/master/R/scaling.R

paretoscale <- function(data) {
    # Here we perform centering
    x.centered <- apply(x, 2, function(x) x - mean(x))
    # Then we perform scaling on the mean-centered matrix
    x.sc <- apply(x.centered, 2, function(x) x/sqrt(sd(x)))
    x.sc <- cbind(sample_classes, x.sc)

x.centered <- apply(x, 2, function(x) x - mean(x)x - mean(column where x is)应该做的事吗?你能解释一下它是如何工作的吗?

标签: rdataframeapply

解决方案


变量名可能有点不科学,尤其是对于新手而言。让我们重写这apply部分,以免混淆读者。

paretoscale <- function(data) {
    # Here we perform centering
    x.centered <- apply(x, 2, function(col) col - mean(col))
    # Then we perform scaling on the mean-centered matrix
    x.sc <- apply(x.centered, 2, function(col) col/sqrt(sd(col)))
    x.sc <- cbind(sample_classes, x.sc)

apply(x, 2, function(col) col - mean(col))它的作用是按列在对象(xdata.frame 或矩阵)上运行。对于每一列,它找到它的平均值并为每个元素减去它。

以下是applyfor循环相比的工作原理。

xy <- data.frame(matrix(1:9, ncol = 3))

apply(X = xy, MARGIN = 2, FUN = function(col) col - mean(col))

     X1 X2 X3
[1,] -1 -1 -1
[2,]  0  0  0
[3,]  1  1  1

# Create an empty object
newxy <- xy
newxy[] <- NA

# Work column-wise
for (i in 1:ncol(xy)) {
  col <- xy[, i]
  # Calculate mean and substract it from all elements of the column
  newxy[, i] <- col - mean(col)
}
newxy

  X1 X2 X3
1 -1 -1 -1
2  0  0  0
3  1  1  1

推荐阅读