python - 如何水平堆叠csr矩阵和numpy.ndarry?
问题描述
我有一个问题,我必须堆叠一个numpy.ndarray (其中包含字符串值)和一个csr 矩阵(其中包含浮点值)
我尝试执行以下操作
1)
from scipy.sparse import hstack
from scipy import sparse
temp = hstack((image_features,sparse.csr_matrix(feature_names)))
print(temp.shape)
print(type(temp))
这给了我以下错误
TypeError: no supported conversion for types: (dtype('O'),)
2)
from scipy.sparse import hstack
from scipy import sparse
temp = hstack((image_features.astype(object),feature_names))
print(temp.shape)
print(type(temp))
由于两个矩阵的大小,这给了我一个内存错误
print(type(image_features))
--> <class 'scipy.sparse.csr.csr_matrix'>
print(type(feature_names))
--> <class 'numpy.ndarray'>
print(image_features.shape)
--> (140047, 34464)
print(feature_names.shape)
--> (140047, 2)
两个矩阵的第一行供参考
print(image_features[0].toarray())
--> array([[0. , 0. , 0. , ..., 0. , 0.6384238,
0. ]])
print(feature_names[0])
--> array(['00007787805e474ea3f33c722178f550', 'Men'], dtype=object)
更新:
- 将 image_feature 转换为数组会产生内存错误
- 做 image_feature.astype('O') 会出现内存错误
输出:
- 我希望输出是稀疏矩阵。
解决方案
解决方案
尝试以下方法之一/或。我认为第二种方法适用于您的情况。
import numpy as np
# Method-1: 2 elements' tuple array data for feature_names
np.hstack([image_features.astype('O'), feature_names[:, np.newaxis])
# Method-2: 2 columns array data for feature_names
np.hstack([image_features.astype('O'),
np.array(feature_names.tolist()).reshape(-1,2).astype('O')])
# Method-3: using pandas
import pandas as pd
df1 = pd.DataFrame(image_features)
df2 = pd.DataFrame(np.array(feature_names.tolist()).reshape(-1,2).astype('O'), columns=['ID', 'Gender'])
df = pd.merge(df1, df2, left_index=True, right_index=True)
#df.head()
df.to_numpy()
例子
我们将制作一些虚拟数据并测试上面给出的解决方案。
import numpy as np
image_features = np.random.rand(2,10).round(3)
feature_names = np.array([('00007787805e474ea3f33c722178f550', 'Male'),
('00007787805e474ea3f33c722223f550', 'Female')],
dtype=[('ID', 'O'),('Gender', 'O')])
print('Shape BEFORE newaxis addition: {}'.format((image_features.shape,
feature_names.shape)))
feature_names = feature_names[:, np.newaxis]
print('Shape AFTER newaxis addition: {}'.format((image_features.shape,
feature_names.shape)))
stacked = np.hstack([image_features.astype('O'), feature_names])
print(stacked)
输出:
Shape BEFORE newaxis addition: ((2, 10), (2,))
Shape AFTER newaxis addition: ((2, 10), (2, 1))
[[0.335 0.576 0.769 0.442 0.34 0.938 0.745 0.085 0.617 0.643
('00007787805e474ea3f33c722178f550', 'Male')]
[0.689 0.959 0.57 0.122 0.328 0.421 0.176 0.797 0.364 0.495
('00007787805e474ea3f33c722223f550', 'Female')]]
为了更清楚,让我们使用 pandas 来展示这一点:
import pandas as pd
print(pd.DataFrame(stacked))
0 1 ... 9 10
0 0.335 0.576 ... 0.643 (00007787805e474ea3f33c722178f550, Male)
1 0.689 0.959 ... 0.495 (00007787805e474ea3f33c722223f550, Female)
推荐阅读
- react-native - 只有一些导航器的不变违规(React-Navigation)
- excel - 在 Excel 中获取 Access 数据“不包含可见表”
- windows - COBOL - 调用 Windows API Getsysteminfo()
- php - Google Drive API 导出使用 application/zip MIME 获取空内容
- c# - C#:使用同步、异步或异步与共享 HttpClient 时的不同结果
- python - 如何将套接字添加到 listWidget
- oauth-2.0 - OAuth2、API 和 JavaScript 应用程序 - 令牌验证服务器端还是客户端?
- javascript - 当位置是相对时,可拖动元素会更改其边距
- umd - Parceljs 构建 UMD
- ios - NavigationController Push 不工作 ios 11