WHAT: Some nodes in row 8A will be idled at MGHPCC
WHEN: May 13th 8am-4pm
To avoid a future over-capacity situation, MGHPCC will be performing power supply work on May 13th from 8am-4pm. This includes sections of row 8A where some of our Cannon compute is located. We will be idling half the nodes in Pod 8a to allow necessary power work.
Unfortunately this cannot be done during our upcoming outage. This work is dictated by availability of electricians and other resources outside the facilities control, otherwise it would have been included in the May 21-24 downtime.
IMPACT
Half the nodes in racks 8a22, 8a28, 8a30, 8a32 will be down (only ~ 114 nodes, around 7% of total capacity). The work being done will also enable us to add more capacity for future purchases so it's important that we allow this interruption.
Impacted partitions are outlined at the bottom of this notice. Impact includes but is not limited to gpu, intermediate, sapphire, and hsph. A reservation to idle the nodes is already in place.
Pending jobs in those partitions will take longer to start due to fewer available nodes during this time. Where possible, please use or include other partitions in your job scripts and plan accordingly for any new or long-running jobs during that period: https://docs.rc.fas.harvard.edu/kb/running-jobs/#Slurm_partitions
Thanks for your understanding.
FAS Research Computing
https://www.rc.fas.harvard.edu
https://status.rc.fas.harvard.edu
Partitions with nodes in 8A whose capacity will be reduced during this maintenance:
arguelles_delgado_gpu
bigmem_intermediate
bigmem
blackhole_gpu
eddy
gershman
gpu
hejazi
hernquist_ice
hoekstra
hsph
huce_ice
iaifi_gpu
iaifi_gpu_priority
intermediate
itc_gpu
joonholee
jshapiro
jshapiro_priority
jshapiro_sapphire
kempner
kempner_dev
kempner_h100
kovac
kozinsky_gpu
kozinsky
kozinsky_priority
murphy_ice
ortegahernandez_ice
sapphire
rivas
seas_compute
seas_gpu
siag_gpu
siag_combo
siag
sur
test
yao
yao_priority
zhuang