首页 > 解决方案 > 在 StellarGraph 中使用 Hinsage/Graphsage 的链接预测返回 NaN

问题描述

我正在尝试使用 stellargraph python 包中的 HinSAGE 运行链接预测。

我有一个由人和产品组成的网络,具有人与人之间的优势(KNOWs)和人与产品之间的优势(BOUGHT)。人和产品都附加了一个属性向量,尽管每种类型的属性向量不同(人向量是 1024 个产品是 200 个)。我正在尝试根据网络中的所有信息创建从人到产品的链接预测算法。我使用 HinSAGE 的原因是归纳学习的选项。

我有下面的代码,我认为我的做法与示例类似

https://stellargraph.readthedocs.io/en/stable/demos/link-prediction/hinsage-link-prediction.html https://stellargraph.readthedocs.io/en/stable/demos/link-prediction/graphsage-link -predict.html

但我不断得到“nan”作为我的输出预测,有人对我可以尝试什么有建议吗?

import networkx as nx
import pandas as pd
import numpy as np
from tensorflow.keras import Model, optimizers, losses, metrics
import stellargraph as sg
from stellargraph.data import EdgeSplitter
from stellargraph.mapper import HinSAGELinkGenerator
from stellargraph.layer import HinSAGE, link_classification, link_regression
from sklearn.model_selection import train_test_split


graph.info()
#StellarGraph: Undirected multigraph
# Nodes: 54226, Edges: 259120
#
# Node types:
#  products: [45027]
#    Features: float32 vector, length 200
#    Edge types: products-BOUGHT->person
#  person: [9199]
#    Features: float32 vector, length 1024
#    Edge types: person-KNOWS->person, person-BOUGHT->product
#
# Edge types:
#    person-KNOWS->person: [246131]
#        Weights: all 1 (default)
#        Features: none
#    person-BOUGHT->product: [12989]
#        Weights: all 1 (default)
#        Features: none



import networkx as nx
import pandas as pd
import numpy as np
import os
import random
from tensorflow.keras import Model, optimizers, losses, metrics
import stellargraph as sg
from stellargraph.data import EdgeSplitter
from stellargraph.mapper import HinSAGELinkGenerator
from stellargraph.layer import HinSAGE, link_classification
from stellargraph.data import UniformRandomWalk
from stellargraph.data import UnsupervisedSampler
from sklearn.model_selection import train_test_split

from stellargraph.layer import HinSAGE, link_regression



edge_splitter_test = EdgeSplitter(graph)
graph_test, edges_test, labels_test = edge_splitter_test.train_test_split(
    p=0.1, method="global", edge_label="BOUGHT"
)
edge_splitter_train = EdgeSplitter(graph_test, graph)

graph_train, edges_train, labels_train = edge_splitter_train.train_test_split(
    p=0.1, method="global", edge_label="BOUGHT"
)


num_samples = [8, 4]

G = graph

batch_size = 20
epochs = 20


generator = HinSAGELinkGenerator(
    G, batch_size, num_samples, head_node_types=["person", "product"]
)
train_gen = generator.flow(edges_train, labels_train, shuffle=True)
test_gen = generator.flow(edges_test, labels_test)


hinsage_layer_sizes = [32, 32]
assert len(hinsage_layer_sizes) == len(num_samples)

hinsage = HinSAGE(
    layer_sizes=hinsage_layer_sizes, generator=generator, bias=True, dropout=0.0
)


# Expose input and output sockets of hinsage:
x_inp, x_out = hinsage.in_out_tensors()



    
# Final estimator layer
prediction = link_classification(
    output_dim=1, output_act="sigmoid", edge_embedding_method="concat"
)(x_out)

model = Model(inputs=x_inp, outputs=prediction)

model.compile(
    optimizer=optimizers.Adam(),
    loss=losses.binary_crossentropy,
    metrics=["acc"],
)

history = model.fit(train_gen, epochs=epochs, validation_data=test_gen, verbose=2)

标签: pythongraph-theorygraph-algorithmstellargraph

解决方案


所以我发现了这个问题,可能对其他人有用。如果有任何节点包含丢失的数据,则该事物只会产生 NA。如果您通过加入 pandas 数据框来创建图表,则尤其危险,我在一个集成的文件中有错字并导致了问题。


推荐阅读