FAS Research Computing - Notice history

Status page for the Harvard FAS Research Computing cluster and other resources.

Cluster Utilization (VPN and FASRC login required): Cannon | FASSE


Please scroll down to see details on any Incidents or maintenance notices.
Monthly maintenance occurs on the first Monday of the month (except holidays).

GETTING HELP
Documentation: https://docs.rc.fas.harvard.edu | Account Portal https://portal.rc.fas.harvard.edu
Email: rchelp@rc.fas.harvard.edu | Support Hours


The colors shown in the bars below were chosen to increase visibility for color-blind visitors.
For higher contrast, switch to light mode at the bottom of this page if the background is dark and colors are muted.

Operational

SLURM Scheduler - Cannon - Operational

Cannon Compute Cluster (Holyoke) - Operational

Boston Compute Nodes - Operational

GPU nodes (Holyoke) - Operational

seas_compute - Operational

Operational

SLURM Scheduler - FASSE - Operational

FASSE Compute Cluster (Holyoke) - Operational

Operational

Kempner Cluster CPU - Operational

Kempner Cluster GPU - Operational

Operational

FASSE login nodes - Operational

Operational

Cannon Open OnDemand/VDI - Operational

FASSE Open OnDemand/VDI - Operational

Operational

Netscratch (Global Scratch) - Operational

Home Directory Storage - Boston - Operational

Tape - (Tier 3) - Operational

Holylabs - Operational

Isilon Storage Holyoke (Tier 1) - Operational

Holystore01 (Tier 0) - Operational

HolyLFS04 (Tier 0) - Operational

HolyLFS05 (Tier 0) - Operational

HolyLFS06 (Tier 0) - Operational

Holyoke Tier 2 NFS (new) - Operational

Holyoke Specialty Storage - Operational

holECS - Operational

Isilon Storage Boston (Tier 1) - Operational

BosLFS02 (Tier 0) - Operational

Boston Tier 2 NFS (new) - Operational

CEPH Storage Boston (Tier 2) - Operational

Boston Specialty Storage - Operational

bosECS - Operational

Samba Cluster - Operational

Globus Data Transfer - Operational

Notice history

Feb 2026

NESE tape maintenance Feb 19th 2026
Scheduled for February 19, 2026 at 1:00 PM – 10:00 PM about 9 hours
  • Planned
    February 19, 2026 at 1:00 PM
    Planned
    February 19, 2026 at 1:00 PM

    From our partners at NESE. Details follow:

    We are installing four new tape frames, which will bring the tape system raw storage capacity to 253 petabytes.

    Service Affected: NESE Tape Service

    Maintenance Window: 8:00 AM - 5:00 PM (EST)

    • The tape service will be unavailable.

    • All upgrade activities are expected to be completed on the same day.

    NOTES:

    • Monitor the MGHPCC Slack #nese channel for status updates and announcements

    • Monitor https://nese.instatus.com/ for real-time updates on progress

    Subscribe to https://nese.instatus.com/subscribe/email for updates and announcements

NESE tape maintenance Feb 9th 2026
Scheduled for February 09, 2026 at 1:00 PM – 10:00 PM about 9 hours
  • Planned
    February 09, 2026 at 1:00 PM
    Planned
    February 09, 2026 at 1:00 PM

    From our partners at NESE. Details follow:

    In the process of the tape front-end file caching system upgrade, we will be installing a new IBM Storage Scale System 6000. We will provide an additional update for when the software integration and data transfer from the current IBM Elastic Storage System 5000 will be performed.

    Service Affected: NESE Tape Service

    Maintenance Window: No Downtime expected

    NOTES:

FASRC monthly maintenance Monday February 2nd, 2026 9am-1pm
  • Completed
    February 02, 2026 at 6:00 PM
    Completed
    February 02, 2026 at 6:00 PM
    Maintenance has completed successfully
  • In progress
    February 02, 2026 at 2:00 PM
    In progress
    February 02, 2026 at 2:00 PM
    Maintenance is now in progress
  • Planned
    February 02, 2026 at 2:00 PM
    Planned
    February 02, 2026 at 2:00 PM

    Monthly maintenance will take place on Monday February 2nd, 2026. Our maintenance tasks should be completed between 9am-1pm.

    NOTICES:

    MAINTENANCE TASKS

    Cannon cluster will be paused during this maintenance?: YES
    FASSE cluster will be paused during this maintenance?: YES

    • MaxTime change

      • Audience: Cluster users

      • Impact: In order to improve scheduling efficiency and stability, we will be setting a maximum run time on all partitions that have MaxTime set to UNLIMITED to a MaxTime of 3 days. The unrestricted partition will be set to 365 days. Partitions that already have MaxTime set will retain their current setting. Partition owners wishing to set a different MaxTime for their partition should contact FASRC. Note that we do no guarantee uptime and so users should utilize checkpointing to save state in case of node failure.

    • Slurm upgrade to 25.11.2

      • Audience: All cluster users

      • Impact: Jobs will be paused during maintenance

    • OOD node reboots

      • Audience; All Open OnDemand users

      • Impact: OOD nodes will reboot during the maintenance window

    • Login node reboots

      • Audience; All login node users

      • Impact: Login nodes will reboot during the maintenance window

    Thank you,
    FAS Research Computing
    https://docs.rc.fas.harvard.edu/
    https://www.rc.fas.harvard.edu/

Jan 2026

FASRC monthly maintenance Monday January 12th, 2026 9am-1pm
  • Completed
    January 12, 2026 at 6:00 PM
    Completed
    January 12, 2026 at 6:00 PM
    Maintenance has completed successfully
  • In progress
    January 12, 2026 at 2:00 PM
    In progress
    January 12, 2026 at 2:00 PM
    Maintenance is now in progress
  • Planned
    January 12, 2026 at 2:00 PM
    Planned
    January 12, 2026 at 2:00 PM

    Monthly maintenance will take place on January 12th, 2026. Our maintenance tasks should be completed between 9am-1pm.

    NOTICES:

    • Changes to SEAS partitions, please see tasks below.

    • Changes to job age priority weighting, please see tasks below.

    • Status Page: You can subscribe to our status to receive notifications of maintenance, incidents, and their resolution at https://status.rc.fas.harvard.edu/ (click Get Updates for options).

    • We'd love to hear success stories about your or your lab's use of FASRC. Submit your story here.

    MAINTENANCE TASKS

    Cannon cluster will be paused during this maintenance?: YES
    FASSE cluster will be paused during this maintenance?:YES

    • Slurm upgrade to 25.11.1

      • Audience: All cluster users (Cannon and FASSE)

      • Impact: Jobs will be paused during maintenance

    • In conjunction with SEAS we will modify seas_gpu and seas_compute time limits 

      • Audience: SEAS users

      • Impact:
        seas_gpu: will be set to 2 days maximum
        seas_compute: will be set to 3 days maximum

        Existing pending jobs longer than these limits will be set to 2 day and 3 day run times depending on partition.

    • Job Age Priority Weight Change

      • Audience: Cluster users

      • Impact: We will be adjusting the weight applied to the priority earned by jobs by virtue of their age. Currently job priority is made up of two factors, Fairshare and Job Age. The Job Age factor is currently set such that jobs gain priority over 3 days with a maximum priority equivalent to jobs with Fairshare of 0.5. This keeps low fairshare jobs from languishing at the bottom of the queue. With the current settings though, users with low fairshare can gain significant advantage over users with higher relative fairshare. To remedy this we will be adjusting the Job Age weight to cap out at an equivalent Fairshare of 0.1. This will still allow jobs with 0 fairshare to gain priority and thus not languish while letting fairshare govern a wider range of higher priority jobs.

    • Login node reboots

      • Audience; All login node users

      • Impact: Login nodes will reboot during the maintenance window

    • Open OnDemand (OOD) node reboots

      • Audienc:; All OOD users

      • Impact: OOD nodes will reboot during the maintenance window

    • Netscratch retention will run

      • Audience: All cluster netscratch users

      • Impact: Files older than 90 days will be removed. Please note that retention cleanup can and does run at any time, not just during the maintenance window.

    Thank you,
    FAS Research Computing
    https://docs.rc.fas.harvard.edu/
    https://www.rc.fas.harvard.edu/

Dec 2025

Scheduling down on Cannon
  • Resolved
    Resolved

    We have found the malformed job which caused the scheduler to wedge and have cancelled it. We will continue to work with SchedMD to prevent future stalls. The scheduler is back in operation. Happy Holidays!

  • Update
    Update

    We have filed a ticket with SchedMD to diagnose the issue (https://support.schedmd.com/show_bug.cgi?id=24376). All partitions are set to DOWN as we assess the situation

  • Investigating
    Investigating

    The slurm scheduler ground to halt at 3pm EST. We are investigating the cause.

Dec 2025 to Feb 2026

Next