python - 如何运行依赖于项目代码的 ray 任务
问题描述
我有一个包含许多文件夹的大型 python 项目
-model
-utils
-compute
我的光线远程代码是计算文件夹中的一些功能,我需要在模型和实用程序的远程任务代码中运行
目前,我收到错误,没有针对不同项目文件夹的此类模块
from utils.osops import run_command
from model.model_desc import ModelInsance
from compute.ray_remote import
@ray.remote
def run_eval_remote(cmd_data, model_json):
model_ins = ModelInsance.read_from_json(model_json)
run_command(model_ins.bash_cmd)
# do some more staff
return some_value
如何正确地做到这一点?
这是一个堆栈跟踪:
"/Users/me/proj/compute/evaluator_ray.py", line 178, in <listcomp>
ray_res = [self.eval_instance(instance, eval_metric) for instance in mutations_for_search]
File "/Users/me/proj/compute/evaluator_ray.py", line 175, in eval_instance
return run_eval_remote.remote(cmd_data, instance_json)
File "/Users/me/miniconda3/envs/proj/lib/python3.8/site-packages/ray/remote_function.py", line 114, in _remote_proxy
return self._remote(args=args, kwargs=kwargs)
File "/Users/me/miniconda3/envs/proj/lib/python3.8/site-packages/ray/util/tracing/tracing_helper.py", line 292, in _invocation_remote_span
return method(self, args, kwargs, *_args, **_kwargs)
File "/Users/me/miniconda3/envs/proj/lib/python3.8/site-packages/ray/remote_function.py", line 202, in _remote
return client_mode_convert_function(
File "/Users/me/miniconda3/envs/proj/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 133, in client_mode_convert_function
return client_func._remote(in_args, in_kwargs, **kwargs)
File "/Users/me/miniconda3/envs/proj/lib/python3.8/site-packages/ray/util/client/common.py", line 98, in _remote
return self.options(**option_args).remote(*args, **kwargs)
File "/Users/me/miniconda3/envs/proj/lib/python3.8/site-packages/ray/util/client/common.py", line 296, in remote
return return_refs(ray.call_remote(self, *args, **kwargs))
File "/Users/me/miniconda3/envs/proj/lib/python3.8/site-packages/ray/util/client/api.py", line 103, in call_remote
return self.worker.call_remote(instance, *args, **kwargs)
File "/Users/me/miniconda3/envs/proj/lib/python3.8/site-packages/ray/util/client/worker.py", line 322, in call_remote
task = instance._prepare_client_task()
File "/Users/me/miniconda3/envs/proj/lib/python3.8/site-packages/ray/util/client/common.py", line 302, in _prepare_client_task
task = self.remote_stub._prepare_client_task()
File "/Users/me/miniconda3/envs/proj/lib/python3.8/site-packages/ray/util/client/common.py", line 119, in _prepare_client_task
self._ensure_ref()
File "/Users/me/miniconda3/envs/proj/lib/python3.8/site-packages/ray/util/client/common.py", line 115, in _ensure_ref
self._ref = ray.put(
File "/Users/me/miniconda3/envs/proj/lib/python3.8/site-packages/ray/util/client/api.py", line 52, in put
return self.worker.put(*args, **kwargs)
File "/Users/me/miniconda3/envs/proj/lib/python3.8/site-packages/ray/util/client/worker.py", line 260, in put
out = [self._put(x, client_ref_id=client_ref_id) for x in to_put]
File "/Users/me/miniconda3/envs/proj/lib/python3.8/site-packages/ray/util/client/worker.py", line 260, in <listcomp>
out = [self._put(x, client_ref_id=client_ref_id) for x in to_put]
File "/Users/me/miniconda3/envs/proj/lib/python3.8/site-packages/ray/util/client/worker.py", line 280, in _put
raise cloudpickle.loads(resp.error)
ModuleNotFoundError: No module named 'compute'
解决方案
我遇到了类似的问题,并解决如下:
- 将代码分发到每个节点(我只是
git clone
在每个节点中 d) - 确保每个节点中代码的版本/分支/等相同
- 在每个节点中设置一个虚拟环境,在其中安装 ray(和其他项目依赖项)
- 从 virtualenv 启动 ray,并加入集群
现在,当您从头节点(或集群外部)启动作业时,依赖项存在并且作业运行良好。
当然,更简洁的分发方式是通过容器,但就我的目的而言,这种方法效果很好。
推荐阅读
- c# - 从 XML 文档中选择一个值,错误
没想到 - maven - lombok 与 maven-lombok-plugi 之间的区别
- angular - 如何在“n”次执行函数后使用 setInterval() 方法停止
- c# - 如何给不再是泛型类型起别名
- r - 跳过 R 中 M1mac 的测试
- ios - 如果 textfield 为空,则将边框设置为红色,如果我们输入文本,则删除红色边框
- google-apps-script - 增加 AppsScript 的 6 分钟执行时间限制
- cypress - 使用柏树从截获的身体中保存价值
- java - 如何使用 Netflix DGS graphql-dgs-extended-scalars JSON 标量(java/spring-boot,maven)?
- php - Google Workspace、服务帐户和 Google Drive API File Watch Channels - 使用模拟的正确方法?