FAS Research Computing - Monthly Maintenance March 6th, 2023 7am-11am – Maintenance details
All systems operational
Status page for the Harvard FAS Research Computing cluster and other resources.
Cluster Utilization (VPN and FASRC login required): Cannon | FASSE
Please scroll down to see details on any Incidents or maintenance notices.
Monthly maintenance occurs on the first Monday of the month (except holidays).
The colors shown in the bars below were chosen to increase visibility for color-blind visitors. For higher contrast, switch to light mode at the bottom of this page if the background is dark and colors are muted.
Monthly Maintenance March 6th, 2023 7am-11am
Completed
Scheduled for March 06, 2023 at 12:00 PM – 5:06 PM
Affects
Cannon Cluster
Under maintenance from 12:00 PM to 5:06 PM
SLURM Scheduler - Cannon
Under maintenance from 12:00 PM to 5:06 PM
Cannon Compute Cluster (Holyoke)
Under maintenance from 12:00 PM to 5:06 PM
Boston Compute Nodes
Under maintenance from 12:00 PM to 5:06 PM
GPU nodes (Holyoke)
Under maintenance from 12:00 PM to 5:06 PM
FASSE Cluster
Under maintenance from 12:00 PM to 5:06 PM
Updates
Completed
March 06, 2023 at 5:06 PM
Completed
March 06, 2023 at 5:06 PM
Maintenance has completed successfully at 12:00PM.
In progress
March 06, 2023 at 4:30 PM
In progress
March 06, 2023 at 4:30 PM
Maintenance is still in progress as of 11:30AM as one of our vendors needs to complete some hardware work.
Access to the Slurm scheduler and all compute are still paused, but access to storage and other services has been restored.
NOTE: All jobs will be paused during maintenance to reduce heat load and allow data center cooling maintenance to take place.
Login node updates and reboots, VDI reboots
Audience: VDI/OpenOnDemand users
Impact: VDI will be unavailable during this and the above Slurm upgrade
RCSMB (samba) Boston network changes
Audience: RCSMB shares mounted out of Boston
Impact: Could cause brief share disconnects during updates
UPDATE: Nexus control plane supervisor switchover - ETA 5 minutes, short network disconnect while restarting
Login node updates/reboot and VDI node reboots
Audience: Anyone logged into a a login node or VDI/OOD node
Impact: Login and VDI/OOD nodes will be unavailable while updating and rebooting
Scratch cleanup ( https://docs.rc.fas.harvard.edu/kb/policy-scratch/ )
Audience: Cluster users
Impact: Files older than 90 days will be removed.
Reminder: Scratch 90-day file retention purging runs occur regularly not just during maintenance periods.
SECURITY UPDATES
HUIT and the CIO Council have set a goal of reducing risk across all schools within the University. All schools are looking to reduce their outstanding vulnerability count 75% by June 2023. These numbers are based on HUIT security scans of our infrastructure.
We at FAS Research Computing are responsible for thousands of physical and virtual machines. To make progress in reducing our total open vulnerabilty count, we’re going to update internal and user facing systems as part of scheduled monthly maintenance windows and on a rolling basis outside of these windows. This will generally mean running OS and security updates as needed and rebooting these nodes when required.
This month, these hosts will get updates as part of our scheduled maintenance:
boslogin01 - boslogin04
holylogin01 - holylogin04
holydtn01 - holydtn04
xdmod4.rc.fas.harvard.edu
rchelp.rc.fas.harvard.edu (our ticket system)
rcsmtp.rc.fas.harvard.edu (our mail system)