python - 在 StellarGraph 中使用 Hinsage/Graphsage 的链接预测返回 NaN
问题描述
我正在尝试使用 stellargraph python 包中的 HinSAGE 运行链接预测。
我有一个由人和产品组成的网络,具有人与人之间的优势(KNOWs)和人与产品之间的优势(BOUGHT)。人和产品都附加了一个属性向量,尽管每种类型的属性向量不同(人向量是 1024 个产品是 200 个)。我正在尝试根据网络中的所有信息创建从人到产品的链接预测算法。我使用 HinSAGE 的原因是归纳学习的选项。
我有下面的代码,我认为我的做法与示例类似
https://stellargraph.readthedocs.io/en/stable/demos/link-prediction/hinsage-link-prediction.html https://stellargraph.readthedocs.io/en/stable/demos/link-prediction/graphsage-link -predict.html
但我不断得到“nan”作为我的输出预测,有人对我可以尝试什么有建议吗?
import networkx as nx
import pandas as pd
import numpy as np
from tensorflow.keras import Model, optimizers, losses, metrics
import stellargraph as sg
from stellargraph.data import EdgeSplitter
from stellargraph.mapper import HinSAGELinkGenerator
from stellargraph.layer import HinSAGE, link_classification, link_regression
from sklearn.model_selection import train_test_split
graph.info()
#StellarGraph: Undirected multigraph
# Nodes: 54226, Edges: 259120
#
# Node types:
# products: [45027]
# Features: float32 vector, length 200
# Edge types: products-BOUGHT->person
# person: [9199]
# Features: float32 vector, length 1024
# Edge types: person-KNOWS->person, person-BOUGHT->product
#
# Edge types:
# person-KNOWS->person: [246131]
# Weights: all 1 (default)
# Features: none
# person-BOUGHT->product: [12989]
# Weights: all 1 (default)
# Features: none
import networkx as nx
import pandas as pd
import numpy as np
import os
import random
from tensorflow.keras import Model, optimizers, losses, metrics
import stellargraph as sg
from stellargraph.data import EdgeSplitter
from stellargraph.mapper import HinSAGELinkGenerator
from stellargraph.layer import HinSAGE, link_classification
from stellargraph.data import UniformRandomWalk
from stellargraph.data import UnsupervisedSampler
from sklearn.model_selection import train_test_split
from stellargraph.layer import HinSAGE, link_regression
edge_splitter_test = EdgeSplitter(graph)
graph_test, edges_test, labels_test = edge_splitter_test.train_test_split(
p=0.1, method="global", edge_label="BOUGHT"
)
edge_splitter_train = EdgeSplitter(graph_test, graph)
graph_train, edges_train, labels_train = edge_splitter_train.train_test_split(
p=0.1, method="global", edge_label="BOUGHT"
)
num_samples = [8, 4]
G = graph
batch_size = 20
epochs = 20
generator = HinSAGELinkGenerator(
G, batch_size, num_samples, head_node_types=["person", "product"]
)
train_gen = generator.flow(edges_train, labels_train, shuffle=True)
test_gen = generator.flow(edges_test, labels_test)
hinsage_layer_sizes = [32, 32]
assert len(hinsage_layer_sizes) == len(num_samples)
hinsage = HinSAGE(
layer_sizes=hinsage_layer_sizes, generator=generator, bias=True, dropout=0.0
)
# Expose input and output sockets of hinsage:
x_inp, x_out = hinsage.in_out_tensors()
# Final estimator layer
prediction = link_classification(
output_dim=1, output_act="sigmoid", edge_embedding_method="concat"
)(x_out)
model = Model(inputs=x_inp, outputs=prediction)
model.compile(
optimizer=optimizers.Adam(),
loss=losses.binary_crossentropy,
metrics=["acc"],
)
history = model.fit(train_gen, epochs=epochs, validation_data=test_gen, verbose=2)
解决方案
所以我发现了这个问题,可能对其他人有用。如果有任何节点包含丢失的数据,则该事物只会产生 NA。如果您通过加入 pandas 数据框来创建图表,则尤其危险,我在一个集成的文件中有错字并导致了问题。
推荐阅读
- python - 递归复制和重命名文件工作,然后意外停止
- python - 如何将单词保存为具有任意整数值的字典键?
- c# - 创建类数组时,其成员仍未实例化
- javascript - 如何在ant design Button的中心显示图标
- apache-spark - 如何将多个列标题转换为pyspark中的新列?
- javascript - 如何在 vanilla JavaScript 中动态创建嵌套对象
- selenium - 使用 selenium 和机器人框架捕获查询持续时间
- python - 升级后,conda env 不再看到包
- javascript - 元素类型无效:应为字符串但得到:未定义无法读取未定义的属性“获取”
- c# - 如何使用 Connectivity Plugin 4.0 获取远程主机状态?