首页 > 解决方案 > 生成通用句子编码器嵌入维度时出错

问题描述

下面是生成嵌入和降维的代码:

def generate_embeddings(text):
    if embed_fn is None:
        embed_fn = hub.load(module_url)
    embedding = embed_fn(text).numpy()
    return embedding


from sklearn.decomposition import IncrementalPCA
def pca():
    pca = IncrementalPCA(n_components = 64, batch_size= 1024)
    pca.fit(generate_embeddings(df))
    features_train = pca.transform(generate_embeddings(df))
    return features_train

当我在 100 000 条记录上运行时,它会引发错误:

ResourceExhaustedError:  OOM when allocating tensor with shape[64338902,512] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
     [[{{node StatefulPartitionedCall/StatefulPartitionedCall/EncoderDNN/EmbeddingLookup/EmbeddingLookupUnique/GatherV2}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
 [Op:__inference_restored_function_body_15375]

Function call stack:
restored_function_body

标签: scikit-learntensorflow2.0pcaembeddingencoder-decoder

解决方案


这显示了您对 GPU 内存的限制。要么减少batch_size要么 的大小network layers


推荐阅读