首页 > 解决方案 > 面对 IndexError:只有整数、切片 (`:`)、省略号 (`...`)、numpy.newaxis (`None`) 和整数或布尔数组是有效的索引

问题描述

我一直在研究链接预测问题,其中必须解析作为 numpy 数组的数据集并将其存储到另一个 numpy 数组中。我正在尝试做同样的事情,但在第 9 行它抛出一个IndexError:只有整数、切片 ( :)、省略号 ( ...)、numpy.newaxis ( None) 和整数或布尔数组是有效的索引。我什至尝试用int对索引进行类型转换,但它似乎不起作用。我在这里想念什么?



    1. train_edges, test_edges, = train_test_split(edgeL,test_size=0.3,random_state=16)   
       
    2. out_dim = int(W_out.shape[1])
    
    3. in_dim = int(W_in.shape[1])
    
    4. train_x = np.zeros((len(train_edges), (out_dim + in_dim) * 2))
    
    5. train_y = np.zeros((len(train_edges), 1))
    
    6. for i, edge in enumerate(train_edges):
    
    7.     u = edge[0]
    
    8.     v = edge[1]
    
    9.     train_x[int(i), : int(out_dim)] = W_out[u]
    
    10.    train_x[int(i), int(out_dim): int(out_dim + in_dim)] = W_in[u]
    
    11.    train_x[i, out_dim + in_dim: out_dim * 2 + in_dim] = W_out[v]
    
    12.    train_x[i, out_dim * 2 + in_dim:] = W_in[v]
    
    13.    if edge[2] > 0:
    
    14.        train_y[i] = 1
    
    15.    else:
    
    16.        train_y[i] = -1

编辑:

作为参考,TheW_out是一个 64 维元组,看起来像这样

print(W_out[0])
type(W_out.shape[1])

Output:
[[0.10160154 0.         0.70414263 0.6772633  0.07685234 0.75205046
  0.421092   0.1776721  0.8622188  0.15669271 0.         0.40653425
  0.5768579  0.75861764 0.6745151  0.37883565 0.18074909 0.73928916
  0.6289512  0.         0.33160248 0.7441727  0.         0.8810399
  0.1110919  0.53732747 0.         0.33330196 0.36220717 0.298112
  0.10643011 0.8997948  0.53510064 0.6845873  0.03440218 0.23005858
  0.8097505  0.7108275  0.38826624 0.28532124 0.37821335 0.3566149
  0.42527163 0.71940386 0.8075657  0.5775364  0.01444144 0.21734199
  0.47439903 0.21176265 0.32279345 0.00187511 0.43511534 0.4302601
  0.39407462 0.20941389 0.199842   0.8710182  0.2160332  0.30246672
  0.27159846 0.19009161 0.32349357 0.08938174]]
int

并且edge是一个元组,它来自具有源、目的地、符号的训练数据集。看起来像这样...

train_edges, test_edges, = train_test_split(edgeL,test_size=0.3,random_state=16)

for i, edge in enumerate(train_edges):
  print(edge)
  print(i)
  type(i)
  type(edge)

Output:
    Streaming output truncated to the last 5000 lines.
2936
['16936', '17031', '1']
2937
['15307', '14904', '1']
2938
['22852', '13045', '1']
2939
['14291', '96703', '1']
2940

非常感谢任何帮助/建议。

标签: numpymachine-learningtrain-test-split

解决方案


正如@indigo_4_alpha 所提到的,该错误是由字符串的“edge[0]”元素引起的。

  • 检查 train_edges 的代码
train_edges, test_edges, = train_test_split(edgeL,test_size=0.3,random_state=16)

for i, edge in enumerate(train_edges):
  print(edge)
  print(i)
  print(edge[0], edge[1],edge[2])
  print(type(edge[0]))

输出

['11635' '22046' '1']
2608
11635 22046 1
<class 'str'>

观察输出后,我注意到单独的 edge[0] 是一个字符串。然后我意识到当它本身是一个字符串时它是int(W_out[u]无效的。u

因此,我在代码的第7行和第 8u=edge[0]行中进行了类型转换,如下所示。u=int(edge[0])

  • 训练和测试数据拆分的主代码
   1. train_edges, test_edges, = train_test_split(edgeL,test_size=0.3,random_state=16)    
    
   2. out_dim = int(W_out.shape[1])
   3. in_dim = int(W_in.shape[1])
   4. train_x = np.zeros((len(train_edges), (out_dim + in_dim) * 2))
   5. train_y = np.zeros((len(train_edges), 1))
   6. for i, edge in enumerate(train_edges):
   7.   u = int(edge[0])
   8.   v = int(edge[1])

感谢大家抽出宝贵的时间并给我宝贵的建议。


推荐阅读