首页 > 解决方案 > 使用 while 循环训练模型

问题描述

我试图迭代一些值,而我的数据集 S_train 的长度 <= 比某个给定的数字,在这种情况下为 11。这是我到目前为止所拥有的

S_new = train
T_new = test
mu_new = mu
mu_test_new = mu_test

while len(S_new) <= 11:
  ground_test =  T_new[target].values.tolist()
  acquisition_function = abs(mu_test - ground_test)
  max_item = np.argmax(acquisition_function) #step 3 : value in test set that maximizes the abs difference of the energy
  alpha_al = test.iloc[[max_item]]  #identify the minimum step in test set
  S_new = S_new.append(alpha_al)
  len(S_new)
  T_new = T_new.drop(test.index[max_item])
  len(T_new)

  gpr = GaussianProcessRegressor(
    # kernel is the covariance function of the gaussian process (GP)
    kernel=Normalization( # kernel equals to normalization -> normalizes a kernel using the cosine of angle formula, k_normalized(x,y) = k(x,y)/sqrt(k(x,x)*k(y,y))
        # graphdot.kernel.fix.Normalization(kernel), set kernel as marginalized graph kernel, which is used to calculate the similarity between 2 graphs
        # implement the random walk-based graph similarity kernel as Kashima, H., Tsuda, K., & Inokuchi, A. (2003). Marginalized kernels between labeled graphs. ICML
        Tang2019MolecularKernel()
    ),
    alpha=1e-4, # value added to the diagonal of the kernel matrix during fitting
    optimizer=True, # default optimizer of L-BFGS-B based on scipy.optimize.minimize
    normalize_y=True, # normalize the y values so taht the means and variance is 0 and 1, repsectively. Will be reversed when predicions are returned
    regularization='+', # alpha (1e-4 in this case) is added to the diagonals of the kernal matrix
     )
  
  start_time = time.time()
  gpr.fit(S_new.graphs, S_new[target], repeat=1, verbose=True) # Fitting train set as graphs (independent variable) with train[target] as dependient variable
  end_time = time.time()
  print("the total time consumption is " + str(end_time - start_time) + ".")
 
  gpr.kernel.hyperparameters
  
  rmse_training = []
  rmse_test = []


  mu_new = gpr.predict(S_new.graphs)

  print('Training set')
  print('MAE:', np.mean(np.abs(S_new[target] - mu_new)))
  print('RMSE:', np.std(S_new[target] - mu_new))
  rmse_training.append(np.std(S_new[target] - mu_new)

  mu_test_new = gpr.predict(T_new.graphs)
  print('Training set')
  print('MAE:', np.mean(np.abs(T_new[target] - mu_test_new)))
  print('RMSE:', np.std(T_new[target] - mu_test_new))
  rmse_test.append(np.std(T_new[target] - mu_test_new)

基本上,我正在计算 T_new 中的值,该值使 T_new 和 mu_test 中的第 i 个元素之间的绝对误差最大化,并将其添加到集合 S_train,然后将其从 T_new 中删除。使用新的 S_train,我将再次训练我的模型,然后执行我上面解释的相同操作。我从来没有使用过while循环,我正在寻找sintaxis,对我来说看起来是正确的,但我收到了这个错误消息:

File "<ipython-input-55-d284ca5f9d1f>", line 42
    mu_test_new = gpr.predict(T_new.graphs)
              ^
SyntaxError: invalid syntax

你知道是什么原因造成的吗?任何建议都非常感谢。一直感谢您的帮助。

标签: pythonmachine-learningwhile-loopdataset

解决方案


问题不在于while循环。这只是打字错误。特别是这条线 -

  rmse_training.append(np.std(S_new[target] - mu_new)

缺少右括号。
如果你试试

  rmse_training.append(np.std(S_new[target] - mu_new))

您看到的错误将消失。

非常值得注意的是,针对特定行报告的错误有时是由于早期的语法错误,这是调试时需要注意的事情。


推荐阅读