FAS Research Computing - Lịch sử thông báo

Trải qua hiệu suất bị giảm sút một phần

Status page for the Harvard FAS Research Computing cluster and other resources.

Cluster Utilization (VPN and FASRC login required): Cannon | FASSE


Please scroll down to see details on any Incidents or maintenance notices.
Monthly maintenance occurs on the first Monday of the month (except holidays).

GETTING HELP
Documentation: https://docs.rc.fas.harvard.edu | Account Portal https://portal.rc.fas.harvard.edu
Email: rchelp@rc.fas.harvard.edu | Support Hours


The colors shown in the bars below were chosen to increase visibility for color-blind visitors.
For higher contrast, switch to light mode at the bottom of this page if the background is dark and colors are muted.

Hiệu suất giảm sút

SLURM Scheduler - Cannon - Hiệu suất giảm sút

Cannon Compute Cluster (Holyoke) - Hiệu suất giảm sút

Boston Compute Nodes - Hiệu suất giảm sút

GPU nodes (Holyoke) - Hiệu suất giảm sút

seas_compute - Hiệu suất giảm sút

Đang hoạt động

SLURM Scheduler - FASSE - Đang hoạt động

FASSE Compute Cluster (Holyoke) - Đang hoạt động

Đang hoạt động

Kempner Cluster CPU - Đang hoạt động

Kempner Cluster GPU - Đang hoạt động

Đang hoạt động

FASSE login nodes - Đang hoạt động

Đang hoạt động

Cannon Open OnDemand/VDI - Đang hoạt động

FASSE Open OnDemand/VDI - Đang hoạt động

Đang hoạt động

Netscratch (Global Scratch) - Đang hoạt động

Home Directory Storage - Boston - Đang hoạt động

Tape - (Tier 3) - Đang hoạt động

Holylabs - Đang hoạt động

Isilon Storage Holyoke (Tier 1) - Đang hoạt động

Holystore01 (Tier 0) - Đang hoạt động

HolyLFS04 (Tier 0) - Đang hoạt động

HolyLFS05 (Tier 0) - Đang hoạt động

HolyLFS06 (Tier 0) - Đang hoạt động

Holyoke Tier 2 NFS (new) - Đang hoạt động

Holyoke Specialty Storage - Đang hoạt động

holECS - Đang hoạt động

Isilon Storage Boston (Tier 1) - Đang hoạt động

BosLFS02 (Tier 0) - Đang hoạt động

Boston Tier 2 NFS (new) - Đang hoạt động

CEPH Storage Boston (Tier 2) - Đang hoạt động

Boston Specialty Storage - Đang hoạt động

bosECS - Đang hoạt động

Samba Cluster - Đang hoạt động

Globus Data Transfer - Đang hoạt động

Lịch sử thông báo

thg 3 2026

Scheduler is degraded
  • Đã khắc phục
    Đã khắc phục

    This incident has been resolved. The scheduler is running normally.

  • Đang điều tra
    Đang điều tra

    The scheduler is in a degraded state due to thrashing
    We are actively working to resolve this problem.

Network issues - Cluster degraded
  • Đã khắc phục
    Đã khắc phục

    This incident has been resolved by draining and rebooting any nodes with stuck mounts.

  • Đang theo dõi
    Đang theo dõi

    Mounts to Holyoke Isilon (specifically /n/sw) are broken on numerous nodes across the cluster. We have a check rolling out to find these nodes so we can remediate them individually. Until remediated the cluster will be in a degraded state. Running jobs may randomly die or fail as they hit nodes that have stale mounts.

    It will be risky to run jobs for the next hour and then, after that point, the cluster will have a large number of nodes closed waiting for them to drain so we can reboot them and fix the mounts.

    At this time we are unaware of any holy-isilon problems other than the effect this had on cluster nodes/running jobs. We will update should we identify any data storage concerns.

  • Đã nhận diện
    Đã nhận diện

    Mounts to Holyoke Isilon (specifically /n/sw) are broken on numerous nodes across the cluster. We have a check rolling out to find these nodes so we can remediate them individually. Until remediated the cluster will be in a degraded state. Running jobs may randomly die or fail as they hit nodes that have stale mounts.

    It will be risky to run jobs for the next hour and then, after that point, the cluster will have a large number of nodes closed waiting for them to drain so we can reboot them and fix the mounts.

  • Đang điều tra
    Đang điều tra

    A network issue affecting storage critical to the cluster is It's causing instability. The cluster is currently in a degraded state as a result. We are looking into the problem. Updates to follow..

thg 2 2026

Tape outage
  • Đã khắc phục
    Đã khắc phục

    This incident has been resolved. Normal tape operations are restored.

  • Đang theo dõi
    Đang theo dõi

    The tape library outage is further extended to Wednesday March 4th at 9am awaiting a hardware replacement part due today. Data can still be uploaded to lab collections via Globus, but be mindful of the 10 TB buffer file limit. The outage affects storage and recall from tape.

  • Đã nhận diện
    Đã nhận diện

    NESE Tape Service is still working with IBM technical support at restoring the inventory. The expected downtime is extended until Tuesday March 3rd, 9am.
    Apologies for the inconvenvenience.

  • Đang điều tra
    Đang điều tra

    NESE Tape service will be down or operating with degraded service (no store and recall) Friday from 12 Noon EST until as late as Monday, 2 March at 9 AM.

    SUMMARY OF ISSUE:

    NESE Tape service is currently not able to store or recall files to and from tape due to vendor firmware issues in the IBM TS4500 tape library. The issue is related to the library robotics and cartridge database and we do NOT expect any data loss from this issue.

    The issue is apparently due to an issue with the inventory database related to a recent firmware update. This database can be scrubbed and reconstructed by the library, which will scan the bar code labels on all the cartridges to rebuild the inventory. Association of files in Globus to tapes is handled separately from the tape library and is not affected by the firmware update.

NESE tape maintenance Feb 19th 2026
  • Hoàn thành
    tháng 02 19, 2026 tại 22:00
    Hoàn thành
    tháng 02 19, 2026 tại 22:00
    Maintenance has completed successfully
  • Đang tiến hành
    tháng 02 19, 2026 tại 13:00
    Đang tiến hành
    tháng 02 19, 2026 tại 13:00
    Maintenance is now in progress
  • Chưa bắt đầu
    tháng 02 19, 2026 tại 13:00
    Chưa bắt đầu
    tháng 02 19, 2026 tại 13:00

    From our partners at NESE. Details follow:

    We are installing four new tape frames, which will bring the tape system raw storage capacity to 253 petabytes.

    Service Affected: NESE Tape Service

    Maintenance Window: 8:00 AM - 5:00 PM (EST)

    • The tape service will be unavailable.

    • All upgrade activities are expected to be completed on the same day.

    NOTES:

    • Monitor the MGHPCC Slack #nese channel for status updates and announcements

    • Monitor https://nese.instatus.com/ for real-time updates on progress

    Subscribe to https://nese.instatus.com/subscribe/email for updates and announcements

thg 1 2026

FASRC monthly maintenance Monday January 12th, 2026 9am-1pm
  • Hoàn thành
    tháng 01 12, 2026 tại 18:00
    Hoàn thành
    tháng 01 12, 2026 tại 18:00
    Maintenance has completed successfully
  • Đang tiến hành
    tháng 01 12, 2026 tại 14:00
    Đang tiến hành
    tháng 01 12, 2026 tại 14:00
    Maintenance is now in progress
  • Chưa bắt đầu
    tháng 01 12, 2026 tại 14:00
    Chưa bắt đầu
    tháng 01 12, 2026 tại 14:00

    Monthly maintenance will take place on January 12th, 2026. Our maintenance tasks should be completed between 9am-1pm.

    NOTICES:

    • Changes to SEAS partitions, please see tasks below.

    • Changes to job age priority weighting, please see tasks below.

    • Status Page: You can subscribe to our status to receive notifications of maintenance, incidents, and their resolution at https://status.rc.fas.harvard.edu/ (click Get Updates for options).

    • We'd love to hear success stories about your or your lab's use of FASRC. Submit your story here.

    MAINTENANCE TASKS

    Cannon cluster will be paused during this maintenance?: YES
    FASSE cluster will be paused during this maintenance?:YES

    • Slurm upgrade to 25.11.1

      • Audience: All cluster users (Cannon and FASSE)

      • Impact: Jobs will be paused during maintenance

    • In conjunction with SEAS we will modify seas_gpu and seas_compute time limits 

      • Audience: SEAS users

      • Impact:
        seas_gpu: will be set to 2 days maximum
        seas_compute: will be set to 3 days maximum

        Existing pending jobs longer than these limits will be set to 2 day and 3 day run times depending on partition.

    • Job Age Priority Weight Change

      • Audience: Cluster users

      • Impact: We will be adjusting the weight applied to the priority earned by jobs by virtue of their age. Currently job priority is made up of two factors, Fairshare and Job Age. The Job Age factor is currently set such that jobs gain priority over 3 days with a maximum priority equivalent to jobs with Fairshare of 0.5. This keeps low fairshare jobs from languishing at the bottom of the queue. With the current settings though, users with low fairshare can gain significant advantage over users with higher relative fairshare. To remedy this we will be adjusting the Job Age weight to cap out at an equivalent Fairshare of 0.1. This will still allow jobs with 0 fairshare to gain priority and thus not languish while letting fairshare govern a wider range of higher priority jobs.

    • Login node reboots

      • Audience; All login node users

      • Impact: Login nodes will reboot during the maintenance window

    • Open OnDemand (OOD) node reboots

      • Audienc:; All OOD users

      • Impact: OOD nodes will reboot during the maintenance window

    • Netscratch retention will run

      • Audience: All cluster netscratch users

      • Impact: Files older than 90 days will be removed. Please note that retention cleanup can and does run at any time, not just during the maintenance window.

    Thank you,
    FAS Research Computing
    https://docs.rc.fas.harvard.edu/
    https://www.rc.fas.harvard.edu/

thg 1 2026 đến thg 3 2026

Sau