java - ELKI 的 WeightedCorePredicate 实现 - 一个例子
问题描述
我最近尝试通过修改 CorePredicate(例如,使用 MinPointCorePredicate 作为构建的基础)在 ELKI 中实现加权 DBSCAN 的示例,我只是想知道是否有人可以批评这是否是正确的实现这个情况:
import de.lmu.ifi.dbs.elki.algorithm.clustering.gdbscan.*;
import de.lmu.ifi.dbs.elki.algorithm.clustering.kmeans.KMeansLloyd;
import de.lmu.ifi.dbs.elki.data.Cluster;
import de.lmu.ifi.dbs.elki.data.Clustering;
import de.lmu.ifi.dbs.elki.data.NumberVector;
import de.lmu.ifi.dbs.elki.data.model.KMeansModel;
import de.lmu.ifi.dbs.elki.data.type.TypeUtil;
import de.lmu.ifi.dbs.elki.database.Database;
import de.lmu.ifi.dbs.elki.database.StaticArrayDatabase;
import de.lmu.ifi.dbs.elki.database.datastore.WritableIntegerDataStore;
import de.lmu.ifi.dbs.elki.database.ids.DBID;
import de.lmu.ifi.dbs.elki.database.ids.DBIDIter;
import de.lmu.ifi.dbs.elki.database.ids.DBIDRange;
import de.lmu.ifi.dbs.elki.database.relation.Relation;
import de.lmu.ifi.dbs.elki.datasource.ArrayAdapterDatabaseConnection;
import de.lmu.ifi.dbs.elki.datasource.DatabaseConnection;
import de.lmu.ifi.dbs.elki.logging.Logging;
import de.lmu.ifi.dbs.elki.logging.LoggingConfiguration;
import de.lmu.ifi.dbs.elki.datasource.DatabaseConnection;
import de.lmu.ifi.dbs.elki.distance.distancefunction.minkowski.SquaredEuclideanDistanceFunction;
import de.lmu.ifi.dbs.elki.algorithm.clustering.kmeans.initialization.RandomlyChosenInitialMeans;
//import de.lmu.ifi.dbs.elki.math.random.RandomFactory;
import de.lmu.ifi.dbs.elki.data.model.Model;
import de.lmu.ifi.dbs.elki.algorithm.clustering.DBSCAN;
import de.lmu.ifi.dbs.elki.data.Clustering;
import de.lmu.ifi.dbs.elki.data.model.Model;
import de.lmu.ifi.dbs.elki.database.Database;
import de.lmu.ifi.dbs.elki.distance.similarityfunction.kernel.RadialBasisFunctionKernelFunction;
import de.lmu.ifi.dbs.elki.data.Clustering;
import de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm;
import de.lmu.ifi.dbs.elki.algorithm.clustering.gdbscan.EpsilonNeighborPredicate;
import de.lmu.ifi.dbs.elki.algorithm.clustering.gdbscan.MinPtsCorePredicate;
import de.lmu.ifi.dbs.elki.database.datastore.WritableIntegerDataStore;
import de.lmu.ifi.dbs.elki.database.datastore.WritableDoubleDataStore;
// Imports for generalized dbscan
import de.lmu.ifi.dbs.elki.algorithm.clustering.gdbscan.CorePredicate;
import de.lmu.ifi.dbs.elki.data.model.Model;
import de.lmu.ifi.dbs.elki.utilities.ELKIServiceLoader;
public class SampleELKI2 {
public static void main(String[] args) {
double[][] data = new double[1000][3]; // The third column refers to the weights
for (int i = 0; i < data.length; i++) {
for (int j = 0; j < data[i].length; j++) {
data[i][j] = Math.random();
//System.out.println(i + " and " + j + " and " + data[i][j]);
}
//System.out.println(i + " and " + data[i][0] + " " + data[i][1] + " " + data[i][2]);
}
// Adapter to load data from an existing array
DatabaseConnection dbc = new ArrayAdapterDatabaseConnection(data);
// Create a database (which may contain multiple relations
Database db = new StaticArrayDatabase(dbc, null);
// Load the data into the database (do NOT forget to initialize)
db.initialize();
// Relation containing the number vectors
Relation<NumberVector> rel = db.getRelation(TypeUtil.NUMBER_VECTOR_FIELD);
// We know that the ids must be a continuous range
DBIDRange ids = (DBIDRange) rel.getDBIDs();
SquaredEuclideanDistanceFunction dist = SquaredEuclideanDistanceFunction.STATIC;
// Default initialization, using global random:
// To fix the random seed, use: new RandomFactory(seed);
// Compute the neighbourhood and core predicates here for generalized gdbscan
// --------------------------------------------------------------------- //
EpsilonNeighborPredicate ENP = new EpsilonNeighborPredicate(0.3, dist); // Generic Neighbourhoodpredicate
WeightedCorePredicate WCP = new WeightedCorePredicate(330, db, 2); // WeightedCorePredicate with the db (db) and column index variable containing the weights (2)
// The Implementation of the predicates in the GDBSCAN - predicates can be replaced for conditionals
Clustering<Model> result = new GeneralizedDBSCAN(ENP, WCP, false).run(db);
int i = 0;
for (Cluster<Model> clu : result.getAllClusters()) {
for (DBIDIter it = clu.getIDs().iter(); it.valid(); it.advance()) {
NumberVector v = rel.get(it);
final int offset = ids.getOffset(it);
}
++i;
}
}
}
新的 WeightedCorePredicate 看起来像这样,来自对 ELKI 源文件中的 MinPtCorePredicate 类的轻微修改。
public class WeightedCorePredicate implements CorePredicate<DBIDs> {
/**
* Class logger.
*/
public static final Logging LOG = Logging.getLogger(MinPtsCorePredicate.class);
/**
* The minpts parameter.
*/
protected int minpts;
static Database db;
static int WeightColumn;
static int WeightSum;
/**
* Default constructor.
*
* @param minpts Minimum number of neighbors to be a core point.
*/
public WeightedCorePredicate(int minpts, Database db, int WeightColumn) {
super();
this.minpts = minpts;
this.db = db;
this.WeightColumn = WeightColumn;
}
@Override
public Instance instantiate(Database database) {
return new Instance(minpts, db, WeightColumn);
}
@Override
public boolean acceptsType(SimpleTypeInformation<? extends DBIDs> type) {
return TypeUtil.DBIDS.isAssignableFromType(type) //
|| TypeUtil.NEIGHBORLIST.isAssignableFromType(type);
}
/**
* Instance for a particular data set.
*
* @author Erich Schubert
*/
public static class Instance implements CorePredicate.Instance<DBIDs> {
/**
* The minpts parameter.
*/
protected int minpts;
protected Database db;
protected int WeightColumn;
protected double WeightSum;
/**
* Constructor for this predicate.
*
* @param minpts MinPts parameter
*/
public Instance(int minpts, Database db, int WeightColumn) {
super();
this.minpts = minpts;
this.db = db;
this.WeightColumn = WeightColumn;
}
@Override
public boolean isCorePoint(DBIDRef point, DBIDs neighbors) {
db.initialize(); // Initialize database
Relation<NumberVector> rel = db.getRelation(TypeUtil.NUMBER_VECTOR_FIELD); // db relation to get to the datapoints
WeightSum = 0; // Make sure to initialize the weights as 0
// DBIDS contain the indices of the points - so just need a database relation to access the points at the index
for (DBIDIter it = neighbors.iter(); it.valid(); it.advance()) {
// System.out.print("The weights are " + rel.get(it).doubleValue(WeightColumn) + "\n");
WeightSum += rel.get(it).doubleValue(WeightColumn); // Sum the weights
}
return WeightSum >= minpts;
}
}
/**
* Parameterization class
*
* @author Erich Schubert
*/
public static class Parameterizer extends AbstractParameterizer {
/**
* Minpts value
*/
protected int minpts;
@Override
protected void makeOptions(Parameterization config) {
super.makeOptions(config);
// Get the minpts parameter
IntParameter minptsP = new IntParameter(DBSCAN.Parameterizer.MINPTS_ID) //
.addConstraint(CommonConstraints.GREATER_EQUAL_ONE_INT);
if(config.grab(minptsP)) {
minpts = minptsP.intValue();
if(minpts <= 2) {
LOG.warning("DBSCAN with minPts <= 2 is equivalent to single-link clustering at a single height. Consider using larger values of minPts.");
}
}
}
@Override
protected WeightedCorePredicate makeInstance() {
return new WeightedCorePredicate(minpts, db, WeightColumn);
}
}
}
本质上,我在 WeightedCorePredicate 中添加了输入,它引用了数据库,我可以使用索引从 rel 中挑选出 db 的元素,并使用 this.WeightColumn 挑选出权重与 X 一起列出的列/Y 列。这源于此处指出的讨论:Elki GDBSCAN Java/Scala - 如何修改DBSCAN 的 ELKI 实现中的 CorePredicate和sample_weight 选项。
对此的任何反馈将不胜感激。我不是来自 Java 背景,主要来自 Python/Scala,所以我完全明白它不是最优雅的 Java 代码。
谢谢
解决方案
推荐阅读
- next.js - 接下来 JS 将 context.query 转换为字符串
- git - Git:重命名后可靠地获取责备历史
- python - 在 Python 中,如何将 0 到 255 之间的整数转换为单个无符号字节?
- python - 通过 pickle 转储进行预测 - ValueError:无法将字符串转换为浮点数:
- c# - 不同 GroupBox 内的多个 ChechBox
- r - 如何在 R 中使用 Workflow 和 Caret 包进行并行计算
- c - ESP-IDF 框架中的链接器错误(未定义引用)
- flutter - ParentDataWidget Flexible(flex: 1) 想要将 FlexParentData 类型的 ParentData 应用于 RenderObject
- kivy - 声明后无效数据 Kivy 教程#5 - 对象属性和.kv 续
- c# - c# 按下键时显示其他字符(字母)