首页 > 解决方案 > HTCondor 根据 Idle 更改 NUM_CPUS?

问题描述

我想根据是否有人在机器上工作来更改 CPU 计数。不想抢占手册中定义的工作。只需执行以下操作:

// condor_config file
if (KeyboardIdle < 10)
    NUM_CPUS = 2
else
    NUM_CPUS = 8
endif

上述命令失败:(KeyboardIdle < 10) is not a valid if condition because complex conditionals are not supported

任何方式我可以实现这个还是NUM_CPUS一个固定变量?


根据 Greg 的回答,我的 condor_config 的最底部如下

NUM_CPUS = 16
START = (SlotID < 8) || (KeyboardIdle > 10)

理论上只允许启动 8 个作业,但运行时condor_status myMachine我得到:

C:\>condor_status myMachine
Name                       OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

slot1@myMachine.cluster  WINDOWS    X86_64 Claimed   Busy      1.210 8186  0+00:00:02
slot2@myMachine.cluster  WINDOWS    X86_64 Claimed   Busy      0.500 8186  0+00:00:03
slot3@myMachine.cluster  WINDOWS    X86_64 Claimed   Busy      2.220 8186  0+00:00:01
slot4@myMachine.cluster  WINDOWS    X86_64 Claimed   Busy      1.500 8186  0+00:00:02
slot5@myMachine.cluster  WINDOWS    X86_64 Claimed   Busy      0.600 8186  0+00:00:02
slot6@myMachine.cluster  WINDOWS    X86_64 Claimed   Busy      0.380 8186  0+00:00:02
slot7@myMachine.cluster  WINDOWS    X86_64 Claimed   Busy      1.940 8186  0+00:00:03
slot8@myMachine.cluster  WINDOWS    X86_64 Claimed   Busy      0.880 8186  0+00:00:02
slot9@myMachine.cluster  WINDOWS    X86_64 Claimed   Busy      1.560 8186  0+00:00:02
slot10@myMachine.cluster WINDOWS    X86_64 Claimed   Busy      0.310 8186  0+00:00:02
slot11@myMachine.cluster WINDOWS    X86_64 Claimed   Busy      2.180 8186  0+00:00:02
slot12@myMachine.cluster WINDOWS    X86_64 Claimed   Busy      1.580 8186  0+00:00:02
slot13@myMachine.cluster WINDOWS    X86_64 Claimed   Busy      0.950 8186  0+00:00:02
slot14@myMachine.cluster WINDOWS    X86_64 Claimed   Busy      1.890 8186  0+00:00:02
slot15@myMachine.cluster WINDOWS    X86_64 Claimed   Busy      0.490 8186  0+00:00:02
slot16@myMachine.cluster WINDOWS    X86_64 Claimed   Busy      1.600 8186  0+00:00:01

               Total Owner Claimed Unclaimed Matched Preempting Backfill  Drain

X86_64/WINDOWS    16     0      16         0       0          0        0      0

         Total    16     0      16         0       0          0        0      0

有什么想法吗?

标签: condor

解决方案


NUM_CPUS 在 HTCondor 中是固定的。通常,这种策略的实现方式是通过更改 START 表达式,以便有不同数量的插槽,其 START 表达式的计算结果为 false,因此无法启动作业。

假设这台机器有静态插槽(默认),一个 START 表达式可能类似于

START = (SlotID < 3) || (KeyboardIdle > 10)

也就是说,对于插槽 1 和 2,start 始终为 true,如果键盘处于空闲状态,则其余插槽为 true。

为了令人讨厌的迂腐,这仅根据键盘使用情况控制该机器上的作业是否开始。仅通过上述配置,一台完全空闲的机器将允许自己充满作业,并且当键盘用户返回时,这些作业将无限期地继续运行。如果您想抢占这些工作,您还可以使用抢占表达式,例如

PREEMPT = (SlotID > 3) && (KeyboardIdle < 10)

推荐阅读