首页 > 解决方案 > 通过将向量扩展/压缩到给定长度来规范化向量

问题描述

我有一个包含 122 个值的向量:

vec1 = c(0,0,0,0,0,0,0,0,-0.0029,-0.0029,-0.0029,-0.0029,-0.0029,-0.0029,-0.0044,-0.0044,-0.0059,-0.0073,-0.0073,-0.0088,-0.0088,-0.0102,-0.0132,-0.0176,-0.0249,-0.0293,-0.0322,-0.0337,-0.0337,-0.0337,-0.0337,-0.0337,-0.0337,-0.0351,-0.0425,-0.0512,-0.0586,-0.0659,-0.0703,-0.0805,-0.0937,-0.1127,-0.1347,-0.1508,-0.1581,-0.1611,-0.1669,-0.1684,-0.1698,-0.1698,-0.1698,-0.1698,-0.1552,-0.1362,-0.104,-0.0439,0.0747,0.2035,0.3353,0.4583,0.5695,0.6501,0.7277,0.7687,0.7892,0.8038,0.8097,0.8141,0.8184,0.8214,0.8243,0.8243,0.8053,0.7804,0.6603,0.5066,0.3338,0.1435,-0.1127,-0.41,-0.6442,-0.8097,-0.8858,-0.9092,-0.9092,-0.9092,-0.9092,-0.9092,-0.9092,-0.9092,-0.9092,-0.9092,-0.9092,-0.9034,-0.8946,-0.8741,-0.8433,-0.8228,-0.8126,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082,-0.8082)

现在我想通过压缩到 100 个值来对其进行归一化,即在这种情况下,vec1 的每 1.22 个值应该由 norm_vec1 的 1 个值表示,如下所示:

norm_vec1 [1] = mean (vec1 [1]) ## (because round(1.22) = 1)
norm_vec1 [2] = mean (vec1 [2]) ## (because round(1.22*2) = 2)
norm_vec1 [3] = mean (vec1 [3:4]) ## (because round(1.22*3) = 4)
norm_vec1 [4] = mean (vec1 [5]) ## (because round(1.22*4) = 5)

等等

结果,我应该在向量 norm_vec1 中得到 100 个值,每个值要么直接取自 vec1,要么是平均的结果,具体取决于其位置。不应遗漏 vec1 中的任何值。重要的是,这也适用于小于 100 的向量(例如,63 个元素):

norm_short_vec1 [1] = mean (short_vec1 [1]) ## (because round(0.63*1)=1)
norm_short_vec1 [2] = mean (short_vec1 [1]) ## (because round(0.63*2)=1)
norm_short_vec1 [3] = mean (short_vec1 [2]) ## (because round(0.63*3)=2)

等等

或者,或者,每个向量都可以乘以 100,然后新值可以基于来自这个新的更长向量的样本,如下所示(如果 vec1 有 122 个值):

long_vec1 = c(c(vec1 [1] repeated 100 times),  (vec1 [2] repeated 100 times), etc.)
norm_vec1 [1] = mean (long_vec1 [1:122])
norm_vec1 [2] = mean (long_vec1 [123:244])
etc.

这有什么功能吗?

标签: rvectornormalization

解决方案


compress <- function(x, length.out) {
  n <- length(x)
  if (n < length.out) stop("length.out is too big")
  spl <- round((1:n)/n*length.out)
  res <- sapply(split(x, spl), mean)
  names(res) <- NULL
  res
}

compress(vec1, 100)
#>   [1]  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000 -0.00145
#>   [8] -0.00290 -0.00290 -0.00290 -0.00290 -0.00440 -0.00440 -0.00590
#>  [15] -0.00730 -0.00805 -0.00880 -0.01020 -0.01320 -0.02125 -0.02930
#>  [22] -0.03220 -0.03370 -0.03370 -0.03370 -0.03370 -0.03370 -0.03510
#>  [29] -0.04250 -0.05490 -0.06590 -0.07030 -0.08050 -0.10320 -0.13470
#>  [36] -0.15080 -0.15810 -0.16110 -0.16765 -0.16980 -0.16980 -0.16980
#>  [43] -0.16250 -0.13620 -0.10400 -0.04390  0.07470  0.26940  0.45830
#>  [50]  0.56950  0.65010  0.74820  0.78920  0.80380  0.80970  0.81410
#>  [57]  0.81990  0.82430  0.82430  0.80530  0.72035  0.50660  0.33380
#>  [64]  0.14350 -0.11270 -0.52710 -0.80970 -0.88580 -0.90920 -0.90920
#>  [71] -0.90920 -0.90920 -0.90920 -0.90920 -0.90920 -0.90920 -0.90340
#>  [78] -0.89460 -0.87410 -0.83305 -0.81260 -0.80820 -0.80820 -0.80820
#>  [85] -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820
#>  [92] -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820 -0.80820
#>  [99] -0.80820 -0.80820

Here spl describes connection between elements of the original vector and the resulting vector. In this particular example it consists of 122 values: 1, 2, 2, 3, 4, ... 99, 100, meaning that first element will go directly to the resulting vector, then second and third will be averaged to populate element 2 of the resulting vector and so on.

UPD

A function based on your second algorithm.


normalize <- function(x, length.out) {
  n <- length(x)
  big_vec <- rep(x, each = length.out)
  res <- sapply(split(big_vec, rep(1:length.out, each = n)), mean)
  names(res) <- NULL
  res
}

This works in the opposite direction as well:

normalize(1:3, length.out = 5)
#> [1] 1.000000 1.333333 2.000000 2.666667 3.000000

推荐阅读