Experiencing partially degraded performance

Status page for the Harvard FAS Research Computing cluster and other resources.

Cluster Utilization (VPN and FASRC login required): Cannon | FASSE | Academic


Please scroll down to see details on any Incidents or maintenance notices.
Monthly maintenance occurs on the first Monday of the month (except holidays).

GETTING HELP
https://docs.rc.fas.harvard.edu | https://portal.rc.fas.harvard.edu | Email: rchelp@rc.fas.harvard.edu


The colors shown in the bars below were chosen to increase visibility for color-blind visitors.
For higher contrast, switch to light mode at the bottom of this page if the background is dark and colors are muted.

FASRC Monthly maintenance August 7, 2023 7am-1pm *NOTE EXTENDED TIME*

Completed
Scheduled for August 07, 2023 at 11:00 AM – 1:42 PM

Affects

Cannon Cluster
SLURM Scheduler - Cannon
Cannon Compute Cluster (Holyoke)
Boston Compute Nodes
GPU nodes (Holyoke)
SEAS compute partition
Updates
  • Completed
    August 07, 2023 at 1:42 PM
    Completed
    August 07, 2023 at 1:42 PM

    Due to a vendor error we were unable to complete holyscratch01 disk shelf replacement. We will work with the vendor to reschedule.

    All other maintenance tasks have completed.

  • In progress
    August 07, 2023 at 11:00 AM
    In progress
    August 07, 2023 at 11:00 AM

    Maintenance is now in progress

  • Planned
    August 07, 2023 at 11:00 AM
    Planned
    August 07, 2023 at 11:00 AM

    August maintenance will run August 7, 2023 from 7am-1pm.

    Please note the extended timeframe.
    See tasks section below for explanation.

    NOTICES

    • CentOS 7 Support EOL: We will be dropping support for CentOS 7 support in September. If your machine or VM is CentOS 7 and connects with Slurm please contact FASRC to discuss options.

    • Test Partition Changes: We are changing test partitions based on changing needs and increasing max time to 12hrs instead of 8 hrs. A reminder that this partition is not for running jobs.

    MAINTENANCE TASKS

    • holyscratch01 Disk Shelf Replacement  All Jobs Will Be Paused
      -- Audience : All cluster and scratch users - Cannon and FASSE
      -- Impact:  Hardware issues with holyscratch01 necessitate the replacement of one of the disk shelves. As a result all jobs and scratch will need to be paused for the duration.
      -- ETA: This swap is expected to take 3-4 hours, but pausing the cluster, vendor interactions, and allowing a margin for over-run requires that we extend maintenance by 2 hours  (7am-1pm)

    • Login node and OOD/VDI reboots
      -- Audience: Anyone logged into a a login node or VDI/OOD node
      -- Impact: Login and VDI/OOD nodes will rebooted during this maintenance window  

    • Scratch cleanup ( https://docs.rc.fas.harvard.edu/kb/policy-scratch/ )
      -- Audience: Cluster users
      -- Impact: Files older than 90 days will be removed.

    Thanks,
    FAS Research Computing
    Department and Service Catalog: https://www.rc.fas.harvard.edu/
    Documentation: https://docs.rc.fas.harvard.edu/
    Status Page: https://status.rc.fas.harvard.edu/