Experiencing partially degraded performance

Status page for the Harvard FAS Research Computing cluster and other resources.

Cluster Utilization (VPN and FASRC login required): Cannon | FASSE | Academic


Please scroll down to see details on any Incidents or maintenance notices.
Monthly maintenance occurs on the first Monday of the month (except holidays).

GETTING HELP
https://docs.rc.fas.harvard.edu | https://portal.rc.fas.harvard.edu | Email: rchelp@rc.fas.harvard.edu


The colors shown in the bars below were chosen to increase visibility for color-blind visitors.
For higher contrast, switch to light mode at the bottom of this page if the background is dark and colors are muted.

FASRC Monthly maintenance Tues. Sept. 5th, 2023 7am-1pm *NOTE EXTENDED TIME*

Completed
Scheduled for September 05, 2023 at 11:00 AM – 3:51 PM

Affects

Cannon Cluster
SLURM Scheduler - Cannon
Cannon Compute Cluster (Holyoke)
Boston Compute Nodes
GPU nodes (Holyoke)
SEAS compute partition
Updates
  • Completed
    September 05, 2023 at 3:51 PM
    Completed
    September 05, 2023 at 3:51 PM

    Maintenance has completed early.
    Both holyscratch01 and holylabs are repaired and back online.
    The scheduler has been resumed.

  • In progress
    September 05, 2023 at 11:00 AM
    In progress
    September 05, 2023 at 11:00 AM

    Maintenance is now in progress

  • Planned
    September 05, 2023 at 11:00 AM
    Planned
    September 05, 2023 at 11:00 AM

    September maintenance will run September 5, 2023 from 7am-1pm.

    Please note the extended timeframe and move to Tuesday due to Monday holiday.
    See tasks section below for explanation. All jobs will be paused duyring this maintenance.

    NOTICES

    • CentOS 7 Support EOL: As noted before, we will be dropping support for CentOS 7 support as of this maintenance. If your machine or VM is CentOS 7 and connects with Slurm please contact FASRC to discuss options.

    • Partition Changes: We have changed test partitions based on changing needs and increased the max time to 12hrs instead of 8 hrs. A reminder that this partition is not for running jobs. A bigmen_intermediate queue has also been added, see partition table at: https://docs.rc.fas.harvard.edu/kb/running-jobs/#articleTOC_6

    • New user training will take place online on September 12, 2023. All are welcome and encouraged to attend. Non-Harvard users who cannot log into the Training Portal but wish to attend should email rchelp@rc.fas.harvard.edu prior to Sept. 12th for instructions on how to join the Zoom session. For this and future training sessions, see: https://www.rc.fas.harvard.edu/upcoming-training/

    MAINTENANCE TASKS

    • holyscratch01 Disk Shelf Replacement  All Jobs Will Be Paused
      -- Audience : All cluster and scratch users - Cannon and FASSE
      -- Impact:  Hardware issues with holyscratch01 necessitate the replacement of one of the disk shelves. As a result all jobs and scratch will need to be paused for the duration. Due to a vendor shipping error, we were not able to complete this task last month.
      -- ETA: This swap is expected to take 3-4 hours, but pausing the cluster, vendor interactions, and allowing a margin for over-run requires that we extend maintenance by 2 hours  (7am-1pm)

    • Slurm scheduler upgrade to 23.02.4
      -- Audience: All cluster users including VDI - Cannon and FASSE
      -- Impact: All jobs and the scheduler will be paused during the maintenance for both the lustre hardware swap (see above) and a Slurm upgrade. The scheduler will be unavailable from 7am-1pm.  

    • Login node and OOD/VDI reboots
      -- Audience: Anyone logged into a login node or VDI/OOD node
      -- Impact: Login and VDI/OOD nodes will rebooted during this maintenance window  

    • Scratch cleanup ( https://docs.rc.fas.harvard.edu/kb/policy-scratch/ )
      -- Audience: Cluster users
      -- Impact: Files older than 90 days will be removed. Please note that retention cleanup can run at any time, not just during the maintenance window.

    Thanks,
    FAS Research Computing
    Department and Service Catalog: https://www.rc.fas.harvard.edu/
    Documentation: https://docs.rc.fas.harvard.edu/
    Status Page: https://status.rc.fas.harvard.edu/