Critical power supply work at MGHPCC May 13th - A subset of Cannon nodes will be idled - Maintenance details - FAS Research Computing

WHAT: Some nodes in row 8A will be idled at MGHPCC

WHEN: May 13th 8am-4pm

To avoid a future over-capacity situation, MGHPCC will be performing power supply work on May 13th from 8am-4pm. This includes sections of row 8A where some of our Cannon compute is located. We will be idling half the nodes in Pod 8a to allow necessary power work.

Unfortunately this cannot be done during our upcoming outage. This work is dictated by availability of electricians and other resources outside the facilities control, otherwise it would have been included in the May 21-24 downtime.

IMPACT

Half the nodes in racks 8a22, 8a28, 8a30, 8a32 will be down (only ~ 114 nodes, around 7% of total capacity). The work being done will also enable us to add more capacity for future purchases so it's important that we allow this interruption.

Impacted partitions are outlined at the bottom of this notice. Impact includes but is not limited to gpu, intermediate, sapphire, and hsph. A reservation to idle the nodes is already in place.

Pending jobs in those partitions will take longer to start due to fewer available nodes during this time. Where possible, please use or include other partitions in your job scripts and plan accordingly for any new or long-running jobs during that period: https://docs.rc.fas.harvard.edu/kb/running-jobs/#Slurm_partitions

Thanks for your understanding.

FAS Research Computing

https://www.rc.fas.harvard.edu

https://status.rc.fas.harvard.edu

Partitions with nodes in 8A whose capacity will be reduced during this maintenance:

arguelles_delgado_gpu

bigmem_intermediate

bigmem

blackhole_gpu

eddy

gershman

gpu

hejazi

hernquist_ice

hoekstra

hsph

huce_ice

iaifi_gpu

iaifi_gpu_priority

intermediate

itc_gpu

joonholee

jshapiro

jshapiro_priority

jshapiro_sapphire

kempner

kempner_dev

kempner_h100

kovac

kozinsky_gpu

kozinsky

kozinsky_priority

murphy_ice

ortegahernandez_ice

sapphire

rivas

seas_compute

seas_gpu

siag_gpu

siag_combo

siag

sur

test

yao

yao_priority

zhuang

FAS Research Computing - Critical power supply work at MGHPCC May 13th - A subset of Cannon nodes will be idled – Maintenance details

Critical power supply work at MGHPCC May 13th - A subset of Cannon nodes will be idled