The 2024 MGHPCC data center annual power downtime will take place May 21-24, 2024.
We will begin our shutdown on Tuesday May 21st and expect a return to service by 5PM Friday May 24th.
- Jobs: Please plan ahead as all still running jobs on the morning of May 21st will be stopped and canceled and will need to be resubmitted after the downtime. Pending jobs will remain in the queue until the cluster returns to regular service on May 24th.
- Access: The cluster, scheduler, login, and OoD nodes will be unavailable for the duration of the downtime. New lab and account requests should wait until after the downtime.
- Storage: All Holyoke storage will be powered down and unavailable for the duration of the downtime. Boston storage will remain online, but your ability to access it may be impacted and network changes may briefly affect its availability.
Further details, an explanation for this year's change in scheduling, a visual timeline, and a list of maintenance tasks overview can be found at:
https://www.rc.fas.harvard.edu/blog/2024-mghpcc-power-downtime/
Progress of the downtime will be posted here on our status page during the event. Note that you can subscribe to receive updates as they happen. Click Get Updates in the upper right.
MAJOR TASK OVERVIEW
OS upgrade to Rocky 8.9 - Point upgrade, no code rebuilds will be required. Switch from system OFED to Mellanox OFED on nodes for improved performance
Infiniband (network) upgrades
BIOS updates (various)
Storage firmware updates
Network Maintenance
Decommission old nodes (targets contacted)
Additional minor one-off updates and maintenance (cable swap, reboots, etc.)
Thanks,
FAS Research Computing
https://www.rc.fas.harvard.edu/
https://docs.rc.fas.harvard.edu/
https://status.rc.fas.harvard.edu/