The MGHPCC Holyoke data center will be performing power work on May 21st -23rd. This work will take out one half (or one 'side') of the power capacity for certain rows/racks including our compute rows. Because of our power draw, one side is not enough power to keep each full rack running.
As such, we will be adding a reservation to idle half the nodes in the partitions listed below. A reservation will cause nodes to drain as jobs complete and stop scheduling new jobs on those nodes if they cannot be completed before the outage. This will allow us to idle and power down those nodes prior to the work and avoid potential blackout/brownout on those racks.
This will mean that these partitions will be up and available, but that half the nodes from each will be down (assuming an even number of nodes).
This work is part of an on-going power capacity upgrade at MGHPCC. We expect this will be the last power work needed and the facility will then provide enough additional power for future expansion as well adding overhead for the current load.
The affected partitions are:
arguelles_delgado
bigmem_intermediate
blackhole_gpu
eddy gershman
hejazi
hernquist
hoekstra
huce_ice
iaifi_gpu
iaifi_gpu_requeue
iaifi_priority
jshapiro
jshapiro_priority
kempner
kempner_requeue
kempner_h100
kempner_h100_priority
kempner_h100_priority2
kovac kozinsky
kozinsky_gpu
kozinsky_requeue
ortegahernandez_ice
rivas
seas_compute
seas_gpu
siag_combo
siag_gpu
sur
zhuang