The annual MGHPCC data center power shutdown and maintenance will occur August 9th through August 12th.
For the most up-to-date task list, see: https://www.rc.fas.harvard.edu/events/mghpcc-power-shutdown-2021/
- Power-down will begin at 6PM on August 9th. (NOTE: some jobs will be terminated at 9am due to rack shutdowns in 7C, see TASKS below)
- Power will be out that night and through the following day, August 10th.
Note: Boston storage will be affected on August 10th.
Boston login and VDI will be affected for the duration of the downtime.
See Boston Data Center note below.
- Maintenance and network upgrades will occur on August 11th.
- Power-up ETA and expected return to service is noon on August 12th.
While this outage impacts all services and resources in the MGHPCC/Holyoke data center, please be aware that this can have a knock-on effect for some Boston services as well.
BOSTON DATA CENTER
Boston storage, login, and VDI WILL be affected on August 10th.
Any additional Boston outages will be noted on our website closer to the date.
- Nodes in Row 7C (Note: starts Aug 9th 9am): Jobs running on any node in the following racks will be terminated by 9am to facilitate shutting down these racks for hardware changes/cooling shutoff: holy7c16, holy7c18, holy7c20, holy7c22, holy7c24, holy7c26
-- This will impact jobs in the following partitions: arguelles_delgado, davies, edwards, fasse, geophysics, giribet, huce_cascade, huce_cascade_priority, imasc, itc_cluster, kovac, cf, ncf_interact, ncf_nrg, ortegahernandez, phelevan, shared (partial outage), test, unrestricted, xlin, zon
-- 36 new bigmem nodes (Intel Ice Lake 64 core, 512 GB), and 18 GPU nodes (4x NVidia A100) will be added in this row. Cooling shutdown to these racks is necessary in order for Lenovo to install this new hardware.
- Login and compute OS upgrades
from CentOS 7.8.2003 to CentOS 7.9.2009
Note: After upgrade SSH keys may change. See: https://docs.rc.fas.harvard.edu/kb/ssh-key-error/
- Infiniband network upgrades
- SLURM master replacement
- Core and distribution equipment replacement
- Tier 1 (Isilon) storage firmware upgrades
- Network maintenance and upgrades: Major upgrades, replacing the 8 year old distribution and core switches to support 2 x 100Gbps connectivity to campus and Internet.