首页 > 解决方案 > Negative length vectors are not allowed in distance function

问题描述

I have a large data frame (375,000 row and 5 columns), all variables are numerical. I would like to spatio-temporal cluster this data frame using hierarchical clustering in R. However, when I try to calculate the distance matrix, I get the following error: "Negative length vectors are not allowed in distance function". Is it because of exceeding the maximum memory my computer has (16 GB RAM)? or is it due to exceeding the maximum length of any vector in R which is 2^31 - 1 (around 2 billions) elements? By the way, how to calculate the length of this distance matrix that I am trying to compute? is it 375,000^2 which equals nearly 100 billion? In any case, what can I do regarding this problem? Can I somehow still use hierarchical clustering in this case?

Clustering using kmeans works perfectly but my supervisor prefers hierarchical clustering.

Any hints/suggestions will be greatly appreciated

P.S. Rows represent vehicle trips IDs, and columns represent: longitude of starting point, latitude of starting point, longitude of end point, latitude of end point and time of trip on specific day (all values are scaled for all variables).

标签: rcluster-analysishierarchical-clusteringdistance-matrix

解决方案


是的,375000^2 超过了向量的长度。

矩阵的大小大致是行*列*数据类型的大小。

计算您需要的内存量,然后将结果返回给您的主管。


推荐阅读