首页 > 解决方案 > 具有多个观测值、变量和组的马氏距离

问题描述

对于iris数据集,我试图找到每对物种之间的马氏距离。我尝试了以下但没有运气。我尝试了以下方法:

group <- matrix(iris$Species) 
group <- t(group[,-5])

variables <- c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")
varibles <- as.matrix(iris[,variables])

mahala_sq <- pairwise.mahalanobis(x=variables, grouping=group)

但收到错误消息

pairwise.mahalanobis(x = variables, grouping = group) 中的错误:nrow(x) 和 length(grouping) 不同

标签: rcluster-analysismahalanobisiris-dataset

解决方案


This works:

HDMD::pairwise.mahalanobis(x=iris[,1:4], grouping=iris$Species)
  • x should be a numeric matrix of observations (columns=variables, rows=observations)
  • grouping should be a "vector of characters or values designating group classification for observations" with length equal to nrow(x)

I realized in editing your question that the problem stems from a typo (you assigned varibles instead of variables); if you fix that typo, your code seems to work (at least doesn't throw an error). (I still claim that my solution is simpler ...)

if you wanted to be a little more careful you could use x <- iris[colnames(x) != "Species"] (or a subset(select=) or dplyr::select() analog) to refer to the omitted column by name rather than position.

If you want (for some reason) to run this analysis with a single response variable, you need to use drop=FALSE to prevent a one-column matrix from being collapsed to a vector, i.e. use x=iris[,1,drop=FALSE]


推荐阅读