python - Particle position not being parametrized properly in pyswarms
问题描述
I am having trouble designing a fitness function for pyswarms which will actually iterate through the particles. I am basing my design off of this (working) example code:
# import modules
import numpy as np
# create a parameterized version of the classic Rosenbrock unconstrained optimzation function
def rosenbrock_with_args(x, a, b, c=0):
f = (a - x[:, 0]) ** 2 + b * (x[:, 1] - x[:, 0] ** 2) ** 2 + c
return f
from pyswarms.single.global_best import GlobalBestPSO
# instatiate the optimizer
x_max = 10 * np.ones(2)
x_min = -1 * x_max
bounds = (x_min, x_max)
options = {'c1': 0.5, 'c2': 0.3, 'w': 0.9}
optimizer = GlobalBestPSO(n_particles=10, dimensions=2, options=options, bounds=bounds)
# now run the optimization, pass a=1 and b=100 as a tuple assigned to args
cost, pos = optimizer.optimize(rosenbrock_with_args, 1000, a=1, b=100, c=0)
kwargs={"a": 1.0, "b": 100.0, 'c':0}
It seems that by writing x[:, 0]
and x[:, 1]
, this somehow parametrizes the particle position matrix for the optimization function. For example, executing x[:, 0]
in the debugger returns:
array([ 9.19955426, -5.31471451, -2.28507312, -2.53652044, -6.29916204,
-8.44170591, 7.80464884, -6.42048159, 9.77440842, -9.06991295])
Now, jumping to (a snippet from) my code, I have this:
def optimize_eps_and_mp(x):
clusterer = DBSCAN(eps=x[:, 0], min_samples=x[:, 1], metric="precomputed")
clusterer.fit(distance_matrix)
clusters = pd.DataFrame.from_dict({index_to_gid[i[0]]: [i[1]] for i in enumerate(clusterer.labels_)},
orient="index", columns=["cluster"])
settlements_clustered = settlements.join(clusters)
cluster_pops = settlements_clustered.loc[settlements_clustered["cluster"] >= 0].groupby(["cluster"]).sum()["pop_sum"].to_list()
print()
return 1
options = {'c1': 0.5, 'c2': 0.3, 'w':0.9}
max_bound = [1000, 10]
min_bound = [1, 2]
bounds = (min_bound, max_bound)
n_particles = 10
optimizer = ps.single.GlobalBestPSO(n_particles=n_particles, dimensions=2, options=options, bounds=bounds)
cost, pos = optimizer.optimize(optimize_eps_and_mp, iters=1000)
(The variables distance_matrix
and settlements
are defined earlier in the code, but it is failing on the line clusterer = DBSCAN(eps=x[:, 0], min_samples=x[:, 1], metric="precomputed")
so they are not relevant. Also, I am aware that it is always returning 1
, I am just trying to get it to run without errors before finishing the function)
When I execute x[:, 0]
in the debugger, it returns:
array([-4.54925788, 3.94338766, 0.97085618, 9.44128746, -2.1932764 ,
9.24640763, 9.18286758, -8.91052863, 0.637599 , -2.28228841])
So, identical to the working example in terms of structure. But it fails on the line clusterer = DBSCAN(eps=x[:, 0], min_samples=x[:, 1], metric="precomputed")
because it is passing the entire contents of x[:, 0]
to the DBSCAN
function rather than parameterizing it like in the working example.
Is there some difference between these examples that I am just not seeing?
I have also tried to paste the fitness function from the working example (rosenbrock_with_args
) into my code and optimize that instead, to eliminate any possibility that some way that I have my implementation set up is incorrect. The solution then converges as normal, so I am completely out of ideas as to why it does not work with my function (optimize_eps_and_mp
)
The exact stacktrace that I get refers to an error in the dbscan algorithm, I am assuming due to it somehow being passed the entire set of particle swarm values rather than individual values:
pyswarms.single.global_best: 0%| |0/1000Traceback (most recent call last):
File "C:/FILES/boates/work_local/_code/warping-pso-dbscan/optimize_eps_and_mp.py", line 63, in <module>
cost, pos = optimizer.optimize(optimize_eps_and_mp, iters=1000)
File "C:\FILES\boates\Anaconda\envs\warping_pso_dbscan\lib\site-packages\pyswarms\single\global_best.py", line 184, in optimize
self.swarm.current_cost = compute_objective_function(self.swarm, objective_func, pool=pool, **kwargs)
File "C:\FILES\boates\Anaconda\envs\warping_pso_dbscan\lib\site-packages\pyswarms\backend\operators.py", line 239, in compute_objective_function
return objective_func(swarm.position, **kwargs)
File "C:/FILES/boates/work_local/_code/warping-pso-dbscan/optimize_eps_and_mp.py", line 38, in optimize_eps_and_mp
clusterer.fit(distance_matrix)
File "C:\FILES\boates\Anaconda\envs\warping_pso_dbscan\lib\site-packages\sklearn\cluster\dbscan_.py", line 351, in fit
**self.get_params())
File "C:\FILES\boates\Anaconda\envs\warping_pso_dbscan\lib\site-packages\sklearn\cluster\dbscan_.py", line 139, in dbscan
if not eps > 0.0:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
pyswarms.single.global_best: 0%| |0/1000
解决方案
TL;博士
粒子群优化使用批处理。给定一批粒子,优化后的函数必须返回一批成本。
错误信息说明
这是错误消息中有趣的部分:
[...]
File "C:\FILES\boates\Anaconda\envs\warping_pso_dbscan\lib\site-packages\sklearn\cluster\dbscan_.py", line 139, in dbscan
if not eps > 0.0:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
这是一个非常常见的 Numpy 错误信息。当您尝试使用数组作为条件时,它会出现。正如消息所解释的那样,
数组的真值[True, False]
是多少。您必须使用all()
或any()
将数组转换为单个布尔值之类的函数。
那么,为什么会发生这种情况?因为eps
不打算成为一个数组。
从DBSCAN
类的文档中,参数eps
和min_samples
是可选的整数。在这里,您将数组传递给它们。
clusterer = DBSCAN(eps=x[:, 0], min_samples=x[:, 1], metric="precomputed")
示例比较
您问为什么您的代码适用于该rosenbrock_with_args
功能。那是因为它执行可以很好地处理数组的操作。x
您将其传递给形状[10, 2]
(10 个维度为 2 的粒子)和a, b, c
标量的二维数组(粒子批次)。由此,它计算一个 shape 的一维数组[10]
,这是每个粒子的成本值。
但是,您的新optimize_eps_and_mp
函数会尝试对数组执行一些不受支持的操作。特别是,您使用数组的一维作为期望标量的eps
参数。DBSCAN
为了使它起作用,您应该自己处理批处理,实例化许多DBSCAN
对象:
for row in x:
clusterer = DBSCAN(eps=row[0], min_value=row[1], [...])
分布式执行
你说 :
该
pyswarms
库应该独立运行它[目标函数]多次(对于群体中的每个粒子)并评估它们的结果,并且它通过将函数一次分配给多组输入来以某种方式做到这一点。
pyswarm
n_processes
实际上可以使用函数的参数并行化您的 swarm 执行optimize
。在这种情况下,您的函数在不同的进程中被多次调用,但仍然使用数组作为输入。在您的情况下,有 10 个粒子,2 个维度,并且n_processes
作为None
(默认),您的x
输入是 shape [10, 2]
。如果您设置n_processes
为 2 您的x
输入将具有 shape [5, 2]
。最后,如果您设置n_processes
为 10,您的x
输入将具有 shape [1, 2]
。在任何一种情况下,您都必须“展开”您的粒子群以进行DBSCAN
实例化。
import pyswarms as ps
def foo(x):
print(x.shape)
return x[:,0]
if __name__ == "__main__":
options = {'c1': 0.5, 'c2': 0.3, 'w': 0.9}
max_bound = [1000, 10]
min_bound = [1, 2]
bounds = (min_bound, max_bound)
n_particles = 10
optimizer = ps.single.GlobalBestPSO(n_particles=n_particles, dimensions=2, options=options, bounds=bounds)
for n_processes in [None, 1, 2, 10]:
print("\nParallelizing on {} processes.".format(n_processes))
optimizer.optimize(foo, iters=1, n_processes=n_processes)
Parallelizing on None processes.
(10, 2)
Parallelizing on 1 processes.
(10, 2)
Parallelizing on 2 processes.
(5, 2)
(5, 2)
Parallelizing on 10 processes.
(1, 2)
(1, 2)
(1, 2)
(1, 2)
(1, 2)
(1, 2)
(1, 2)
(1, 2)
(1, 2)
(1, 2)
因此,这是有关如何DBSCAN
在您的案例中使用的完整示例。
def optimize_eps_and_mp(x):
num_particles = x.shape[0]
costs = np.zeros([num_particles])
print("Particles swarm", x)
for idx, particle in enumerate(x):
print("Particle", particle)
clusterer = DBSCAN(eps=x[0], min_samples=x[1], metric="precomputed")
clusterer.fit(distance_matrix)
clusters = pd.DataFrame.from_dict({index_to_gid[i[0]]: [i[1]] for i in enumerate(clusterer.labels_)},
orient="index", columns=["cluster"])
settlements_clustered = settlements.join(clusters)
cluster_pops = settlements_clustered.loc[settlements_clustered["cluster"] >= 0].groupby(["cluster"]).sum()["pop_sum"].to_list()
cost = 1 # Update this to compute cost value of the current particle
costs[idx] = cost
return costs
推荐阅读
- python - 我不知道为什么我一直得到 12 < 3
- discord.py - Discord bot:检查消息内容是否为 int
- python - ValueError ResNet Keras
- excel - Excel VB函数太多时间
- python - 无法在 Python 中处理 Chrome 通知
- java - 如何创建带有参数的工厂方法?
- python-3.x - 计算每小时时间范围的情绪指数
- javascript - JavaScript 结果未显示在 PhoneGap 中
- android - Android LiveData 和 Room:getValue 返回 NULL
- ios - iOS RxSwift - 如何使用 Amb 运算符?