python - Scipy 错误 - ValueError:行索引超出矩阵维度
问题描述
我使用下面的代码来构建训练和测试矩阵,以便在我的 NN 模型中使用它们。
from scipy.sparse import csr_matrix
import pandas as pd
from sklearn.model_selection import train_test_split
df = pd.read_csv('data.csv', names=['x', 'y', 'z'])
x = df.x.unique().shape[0]
y = df.y.unique().shape[0]
train_data, test_data = train_test_split(df, test_size=0.2)
train_data = pd.DataFrame(train_data)
test_data = pd.DataFrame(test_data)
#Build train matrix
train_x = []
train_y = []
train_z = []
for line in train_data.itertuples():
u = line[1] - 1
i = line[2] - 1
train_x.append(u)
train_y.append(i)
train_z.append(line[3])
train_matrix = csr_matrix((train_z, (train_x, train_y)), shape=(x, y))
#Build test matrix
test_x = []
test_y = []
test_z = []
for line in test_data.itertuples():
test_x.append(line[1] - 1)
test_y.append(line[2] - 1)
test_z.append(line[3])
test_matrix = csr_matrix((test_z, (test_x, test_y)), shape=(x, y))
当我使用小型数据集时,它可以完美运行。但是,当我用它来处理稍微大一点的数据集(600 MB)时,它就不起作用了。它向我显示了这个错误:
File "C:\Users\Mus\Anaconda3\lib\site-packages\scipy\sparse\compressed.py", line 51, in __init__
other = self.__class__(coo_matrix(arg1, shape=shape))
File "C:\Users\Mus\Anaconda3\lib\site-packages\scipy\sparse\coo.py", line 192, in __init__
self._check()
File "C:\Users\Mus\Anaconda3\lib\site-packages\scipy\sparse\coo.py", line 272, in _check
raise ValueError('row index exceeds matrix dimensions')
ValueError: row index exceeds matrix dimensions
当我尝试下面的代码时,它在同一行显示了另一个错误:
train_data, test_data = train_test_split(csr_matrix(df[z].values, (df[x].values, df[y].values)), test_size=0.2)
File "C:\Users\Mus\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2688, in __getitem__
return self._getitem_column(key)
File "C:\Users\Mus\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2695, in _getitem_column
return self._get_item_cache(key)
File "C:\Users\Mus\Anaconda3\lib\site-packages\pandas\core\generic.py", line 2489, in _get_item_cache
values = self._data.get(item)
File "C:\Users\Mus\Anaconda3\lib\site-packages\pandas\core\internals.py", line 4115, in get
loc = self.items.get_loc(item)
File "C:\Users\Mus\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 3080, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\_libs\index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 5
我感谢您的帮助
解决方案
@CJR 提出的这段代码替换了所有构建训练和测试矩阵的代码
train_matrix, test_matrix = train_test_split(csr_matrix((df['z'].values, (df['x'].values, df['y'].values))), test_size=0.2)
推荐阅读
- python - Django 错误“django.db.utils.ProgrammingError: subquery has too many columns”
- netbeans - 使用 netbeans 生成 Web 服务客户端时的相对路径
- python - 从 URL 获取 json 属性
- javascript - 赛普拉斯:自定义命令返回数组:如何循环运行测试套件?
- javascript - webpack中没有定义require,node环境
- c - c lang 中的 if 语句有问题
- vim - 如何使用 Ultisnip 获取降价片段以在 vim 中的数学中工作?
- python - AttributeError:尝试在线程中运行 TensorFlow 模型推理时,“密集”对象没有属性“内核”
- angular - 属性 'staticAlert' 没有初始化程序,也没有在 constructor.ts 中明确分配
- python-3.x - 需要 Selenium Webdriver Python 代码解释