UPDATE 6/12 8:50PM: All maintenance complete, we are load testing scratchlfs for the next hour and will then open partitions in SLURM.
UPDATE 6/12 7:55PM: Maintenance on scratchlfs is nearly complete and we expect to be able to return the cluster to operation very shortly. We will update this page and send an all-clear email as soon as that is the case.
UPDATE 6/12 5PM: We are commencing with final compute upgrades and initial powering up. However, we will remain in a holding pattern as we do not have a firm ETA from scratchlfs vendor on when their work will complete. We will update again at 8PM with either an all-clear, or with an updated ETA and email to all if scratchlfs is not yet online.
UPDATE 6/12 2PM: All internal maintenance is Progressing as expected and on time. Completion of work by vendor on scratchlfs is currently the only unknown. Next update at 5PM.
UPDATE 6/11 1:35PM: Lab share moves - dulacfs2, osheafs1, hoekstrafs4, tzipermanfs2 are moved and back online.
Each year our primary data center, MGHPCC (Holyoke), performs a full power shutdown for electrical maintenance. This requires us to power down all FASRC systems at MGHPCC starting the evening before. This includes all compute and many storage systems. Some systems housed at our Boston data center may also be affected.
This period also allows us a window to fit in maintenance that would otherwise require us shutting off various resources during normal operations. Note that this power event will mean the termination of all running jobs as power to the entire facility will be out. Jobs cannot be suspended and resumed as nodes will be powered off.
NOTE: Shutdown of our systems prior to MGHPCC power-off begins Monday 6/10/19 6PM. Return to normal operation ETA June 12th 8PM.
After June 12th, EasyBuild will be added to all user environments. This requires your attention as your job scripts may fail if module calls do not use the full name.
For best interoperability of EasyBuild based modules with existing software modules, please use complete module names and versions to make sure the correct software modules are loaded in your user environment. Example: module load intel/17.0.4-fasrc01
If you are currently using "module load intel" it will load intel from EasyBuild space and break your workflow.
During this event, once basic power is available to us, we will also be upgrading all compute nodes to the latest CentOS. This is a minor version update. No impact after the upgrade is expected.
We will notify the community via our users email list when we are back to normal operations.
We will reflect the current status on our status page: https://status.rc.fas.harvard.edu
For details, see: https://www.rc.fas.harvard.edu/mghpcc-shutdown-2019