r - 将函数应用于利用列手段的数据框(~帕累托缩放)
问题描述
在数据框中,我想将每个值除以列值的标准偏差的平方根(~ Pareto Scaling)。我从现有包中获取了代码(https://github.com/cran/RFmarkerDetector/blob/master/R/scaling.R)
paretoscale <- function(data) {
# Here we perform centering
x.centered <- apply(x, 2, function(x) x - mean(x))
# Then we perform scaling on the mean-centered matrix
x.sc <- apply(x.centered, 2, function(x) x/sqrt(sd(x)))
x.sc <- cbind(sample_classes, x.sc)
做x.centered <- apply(x, 2, function(x) x - mean(x)
它x - mean(column where x is)
应该做的事吗?你能解释一下它是如何工作的吗?
解决方案
变量名可能有点不科学,尤其是对于新手而言。让我们重写这apply
部分,以免混淆读者。
paretoscale <- function(data) {
# Here we perform centering
x.centered <- apply(x, 2, function(col) col - mean(col))
# Then we perform scaling on the mean-centered matrix
x.sc <- apply(x.centered, 2, function(col) col/sqrt(sd(col)))
x.sc <- cbind(sample_classes, x.sc)
apply(x, 2, function(col) col - mean(col))
它的作用是按列在对象(x
data.frame 或矩阵)上运行。对于每一列,它找到它的平均值并为每个元素减去它。
以下是apply
与for
循环相比的工作原理。
xy <- data.frame(matrix(1:9, ncol = 3))
apply(X = xy, MARGIN = 2, FUN = function(col) col - mean(col))
X1 X2 X3
[1,] -1 -1 -1
[2,] 0 0 0
[3,] 1 1 1
# Create an empty object
newxy <- xy
newxy[] <- NA
# Work column-wise
for (i in 1:ncol(xy)) {
col <- xy[, i]
# Calculate mean and substract it from all elements of the column
newxy[, i] <- col - mean(col)
}
newxy
X1 X2 X3
1 -1 -1 -1
2 0 0 0
3 1 1 1
推荐阅读
- python - 如何使用 pyspark 从 ElasticSearch 获取 JSON 文件?
- https - Extract HTTPS host from first TCP message
- amazon-web-services - 无法延长 AWS 中 beanstalk 的超时时间
- django - Django_filter on an apiview
- django - 如何将用户重定向到显示他/她在 Django 表单中输入的数据的动态页面?
- python - Batches of points with the same label on Pytorch
- node.js - npm install 失败,代码为 ELIFECYCLE。在 node-expat@2.3.18 安装脚本失败
- asp.net - filter date with datatables in asp.net
- java - Why is it that my class is showing as not being used and thus can't compile my program?
- c# - gRPC import already existing data classes C#