首页 > 解决方案 > 为什么我的矩阵完成程序在训练时,训练集上的 RMSE 会下降,而测试集上的 RMSE 会上升?

问题描述

我正在编写一个简单的矩阵完成程序。奇怪的是,训练集上的 RMSE 在训练过程中是下降的,而测试集上的 RMSE 是先飙升然后慢慢下降的。我不知道这是否正常。

这是运行日志:

Iter    0: cost=122720.2005, TRN_rmse=0.0991, TST_rmse=0.1787
Iter    1: cost=52211.5079, TRN_rmse=0.0640, TST_rmse=0.2769
Iter    2: cost=51622.2058, TRN_rmse=0.0636, TST_rmse=0.2906
Iter    3: cost=51548.6679, TRN_rmse=0.0636, TST_rmse=0.2929
Iter    4: cost=51479.9532, TRN_rmse=0.0635, TST_rmse=0.2935
Iter    5: cost=51401.8634, TRN_rmse=0.0635, TST_rmse=0.2936
Iter    6: cost=51308.3281, TRN_rmse=0.0634, TST_rmse=0.2938
Iter    7: cost=51190.9587, TRN_rmse=0.0634, TST_rmse=0.2939
Iter    8: cost=51037.6073, TRN_rmse=0.0633, TST_rmse=0.2940
Iter    9: cost=50831.0238, TRN_rmse=0.0631, TST_rmse=0.2942
Iter   10: cost=50547.6240, TRN_rmse=0.0630, TST_rmse=0.2944
Iter   11: cost=50156.6966, TRN_rmse=0.0627, TST_rmse=0.2947
Iter   12: cost=49620.2475, TRN_rmse=0.0623, TST_rmse=0.2950
Iter   13: cost=48897.6851, TRN_rmse=0.0618, TST_rmse=0.2954
Iter   14: cost=47972.2207, TRN_rmse=0.0612, TST_rmse=0.2957
Iter   15: cost=46877.6057, TRN_rmse=0.0604, TST_rmse=0.2961
Iter   16: cost=45671.0552, TRN_rmse=0.0595, TST_rmse=0.2963
Iter   17: cost=44423.6615, TRN_rmse=0.0585, TST_rmse=0.2964
Iter   18: cost=43188.0642, TRN_rmse=0.0576, TST_rmse=0.2961
Iter   19: cost=41988.5550, TRN_rmse=0.0567, TST_rmse=0.2955
Iter   20: cost=40845.6367, TRN_rmse=0.0558, TST_rmse=0.2946
Iter   30: cost=33144.0653, TRN_rmse=0.0493, TST_rmse=0.2847
Iter   40: cost=29772.7998, TRN_rmse=0.0460, TST_rmse=0.2789
Iter   50: cost=27966.7217, TRN_rmse=0.0442, TST_rmse=0.2754
Iter   60: cost=26816.7560, TRN_rmse=0.0429, TST_rmse=0.2730
Iter   70: cost=26037.9318, TRN_rmse=0.0420, TST_rmse=0.2714
Iter   80: cost=25465.6860, TRN_rmse=0.0413, TST_rmse=0.2703
Iter   90: cost=25026.3467, TRN_rmse=0.0408, TST_rmse=0.2695
Iter  100: cost=24683.8015, TRN_rmse=0.0404, TST_rmse=0.2688

这是我的代码:

def matrix_completion(OMEGA, Y, m, n, rank, lambd, max_iter, eps=1e-8, test_mask=None, test_Y=None):
    """
    Standard matrix completion.
    :param OMEGA: 0-1 indicator matrix to mask the observed training entries
    :param Y: known entries
    :param m: number of rows of Y
    :param n: number of columns of Y
    :param rank: rank of U and V
    :param lambd: regularization parameter
    :param max_iter: max number of iterations
    :param eps: term added to the denominator to improve numerical stability (default: 1e-8)
    :param test_mask: mask of test matrix
    :param test_Y: test matrix
    :return: recovered matrix X
    """
    # initialize U and V
    U = np.random.rand(m, rank)
    V = np.random.rand(n, rank)

    # training cycle
    old_cost = 1e10     # cost of last cycle, initialize to a huge number
    for i in range(max_iter):
        # alternating least square & multiplicative update rule
        U = U * ((lambd * np.matmul(OMEGA * Y, V)) /
                 (lambd * np.matmul(OMEGA * np.matmul(U, V.T), V) + U + eps))
        V = V * ((lambd * np.matmul((OMEGA * Y).T, U)) /
                 (lambd * np.matmul((OMEGA * np.matmul(U, V.T)).T, U) + V + eps))
        # compute the cost
        cost = (lambd / 2) * np.sum(np.square(OMEGA * (np.matmul(U, V.T) - Y))) + \
               0.5 * np.sum(np.square(U)) + 0.5 * np.sum(np.square(V))
        if i % 10 == 0 or i < 20:
            trn_rmse = np.sqrt(np.sum(np.square(OMEGA * (Y - np.matmul(U, V.T)))) / np.sum(OMEGA))
            if test_mask is not None and test_Y is not None:
                tst_rmse = np.sqrt(np.sum(np.square(test_mask * (test_Y - np.matmul(U, V.T)))) / np.sum(test_mask))
            else:
                tst_rmse = 0
            print("Iter %4d: cost=%.4lf, TRN_rmse=%.4lf, TST_rmse=%.4lf" % (i, cost, trn_rmse, tst_rmse))
        # stopping condition: improvement < 0.01%
        if abs(cost - old_cost) / old_cost < 1e-4:
            print("Early stopping.")
            break
        old_cost = cost
    # get recovered matrix X
    X = np.matmul(U, V.transpose())

    return X

更多细节:

  1. 我的目标函数是(请运行代码片段查看方程)

<a href="https://www.codecogs.com/eqnedit.php?latex=\inline&space;\min_{U&space;\ge&space;0,&space;V&space;\ge&space;0}&space;\frac{\lambda}{2}&space;\Vert&space;\Omega&space;\circ&space;(U&space;V^T&space;-&space;Y)&space;\Vert_F^2&space;&plus;&space;\frac{1}{2}&space;\Vert&space;U&space;\Vert_F^2&space;&plus;&space;\frac{1}{2}&space;\Vert&space;V&space;\Vert_F^2" target="_blank"><img src="https://latex.codecogs.com/gif.latex?\inline&space;\min_{U&space;\ge&space;0,&space;V&space;\ge&space;0}&space;\frac{\lambda}{2}&space;\Vert&space;\Omega&space;\circ&space;(U&space;V^T&space;-&space;Y)&space;\Vert_F^2&space;&plus;&space;\frac{1}{2}&space;\Vert&space;U&space;\Vert_F^2&space;&plus;&space;\frac{1}{2}&space;\Vert&space;V&space;\Vert_F^2" title="\min_{U \ge 0, V \ge 0} \frac{\lambda}{2} \Vert \Omega \circ (U V^T - Y) \Vert_F^2 + \frac{1}{2} \Vert U \Vert_F^2 + \frac{1}{2} \Vert V \Vert_F^2" /></a>

  1. 我的更新规则是(请运行代码片段查看方程式):

<a href="https://www.codecogs.com/eqnedit.php?latex=\inline&space;U&space;\leftarrow&space;U&space;\circ&space;\frac{\lambda&space;(\Omega&space;\circ&space;Y)&space;V}{\lambda&space;(\Omega&space;\circ&space;UV^T)&space;V&space;&plus;&space;U}&space;\\&space;V&space;\leftarrow&space;V&space;\circ&space;\frac{\lambda&space;(\Omega&space;\circ&space;Y)^T&space;U}{\lambda&space;(\Omega&space;\circ&space;UV^T)^T&space;U&space;&plus;&space;V}" target="_blank"><img src="https://latex.codecogs.com/gif.latex?\inline&space;U&space;\leftarrow&space;U&space;\circ&space;\frac{\lambda&space;(\Omega&space;\circ&space;Y)&space;V}{\lambda&space;(\Omega&space;\circ&space;UV^T)&space;V&space;&plus;&space;U}&space;\\&space;V&space;\leftarrow&space;V&space;\circ&space;\frac{\lambda&space;(\Omega&space;\circ&space;Y)^T&space;U}{\lambda&space;(\Omega&space;\circ&space;UV^T)^T&space;U&space;&plus;&space;V}" title="U \leftarrow U \circ \frac{\lambda (\Omega \circ Y) V}{\lambda (\Omega \circ UV^T) V + U} \\ V \leftarrow V \circ \frac{\lambda (\Omega \circ Y)^T U}{\lambda (\Omega \circ UV^T)^T U + V}" /></a>

  1. 超参数:rank = 100,lambd = 1。实际上,对于其他ranks和lambds,趋势是相同的。

标签: python-3.xnumpymatrixmachine-learninglinear-algebra

解决方案


推荐阅读