FAS Research Computing - История уведомлений

Все системы работают

Status page for the Harvard FAS Research Computing cluster and other resources.

Cluster Utilization (VPN and FASRC login required): Cannon | FASSE


Please scroll down to see details on any Incidents or maintenance notices.
Monthly maintenance occurs on the first Monday of the month (except holidays).

GETTING HELP
Documentation: https://docs.rc.fas.harvard.edu | Account Portal https://portal.rc.fas.harvard.edu
Email: rchelp@rc.fas.harvard.edu | Support Hours


The colors shown in the bars below were chosen to increase visibility for color-blind visitors.
For higher contrast, switch to light mode at the bottom of this page if the background is dark and colors are muted.

Работает

SLURM Scheduler - Cannon - Работает

Cannon Compute Cluster (Holyoke) - Работает

Boston Compute Nodes - Работает

GPU nodes (Holyoke) - Работает

seas_compute - Работает

Работает

SLURM Scheduler - FASSE - Работает

FASSE Compute Cluster (Holyoke) - Работает

Работает

Kempner Cluster CPU - Работает

Kempner Cluster GPU - Работает

Работает

FASSE login nodes - Работает

Работает

Cannon Open OnDemand - Работает

FASSE Open OnDemand - Работает

Работает

Netscratch (Global Scratch) - Работает

Home Directory Storage - Boston - Работает

Tape - (Tier 3) - Работает

Holylabs - Работает

Isilon Storage Holyoke (Tier 1) - Работает

Holystore01 (Tier 0) - Работает

HolyLFS04 (Tier 0) - Работает

HolyLFS05 (Tier 0) - Работает

HolyLFS06 (Tier 0) - Работает

Holyoke Tier 2 NFS (new) - Работает

Holyoke Specialty Storage - Работает

holECS - Работает

Isilon Storage Boston (Tier 1) - Работает

BosLFS02 (Tier 0) - Работает

Boston Tier 2 NFS (new) - Работает

CEPH Storage Boston (Tier 2) - Работает

Boston Specialty Storage - Работает

bosECS - Работает

Samba Cluster - Работает

Globus Data Transfer - Работает

История уведомлений

сент. 2023

Ceph instability - Affects Boston VMs (Virtual Machines) and Tier2 Ceph shares
  • Решено
    Решено

    The Ceph instability has been resolved. Caeph Tier2 shares, VDI, and VMs should be back to their normal state.

    If your VM, /net/fs-[labname] share, or VDI session is still impacted, please contact rchelp@rc.fas.harvard.edu

  • Определено
    Определено

    The infrastructure behind Tier2 Ceph shares and VMs is unstable.
    This also affects VDI/OOD which relies on virtual machines.

    /net/fs-[labname] shares, new OOD/VDI sessions, and VMs are affected and may will be inaccessible until this is resolved.

    Thanks for your patience.

Ceph instability - Affects Boston VMs (Virtual Machines) and Tier2 Ceph shares
  • Решено
    Решено

    The Ceph instability has been resolved. Caeph Tier2 shares, VDI, and VMs should be back to their normal state.

    If your VM, /net/fs-[labname] share, or VDI session is still impacted, please contact rchelp@rc.fas.harvard.edu

  • Определено
    Определено

    The infrastructure behind Tier2 Ceph shares and VMs is unstable.
    This also affects VDI/OOD which relies on virtual machines.

    /net/fs-[labname] shares, new OOD/VDI sessions, and VMs are affected and may will be inaccessible until this is resolved.

    Thanks for your patience.

Ceph instability - Affects Boston VMs (Virtual Machines) and Tier2 Ceph shares
  • Решено
    Решено

    The Ceph instability has been resolved. Caeph Tier2 shares, VDI, and VMs should be back to their normal state.

    If your VM, /net/fs-[labname] share, or VDI session is still impacted, please contact rchelp@rc.fas.harvard.edu

  • Определено
    Определено

    The infrastructure behind Tier2 Ceph shares and VMs is unstable.
    This also affects VDI/OOD which relies on virtual machines.

    /net/fs-[labname] shares, new OOD/VDI sessions, and VMs are affected and may will be inaccessible until this is resolved.

    Thanks for your patience.

Ceph instability - Affects Boston VMs (Virtual Machines) and Tier2 Ceph shares
  • Решено
    Решено

    The Ceph instability has been resolved. Caeph Tier2 shares, VDI, and VMs should be back to their normal state.

    If your VM, /net/fs-[labname] share, or VDI session is still impacted, please contact rchelp@rc.fas.harvard.edu

  • Определено
    Определено

    The infrastructure behind Tier2 Ceph shares and VMs is unstable.
    This also affects VDI/OOD which relies on virtual machines.

    /net/fs-[labname] shares, new OOD/VDI sessions, and VMs are affected and may will be inaccessible until this is resolved.

    Thanks for your patience.

Ceph instability - Affects Boston VMs (Virtual Machines) and Tier2 Ceph shares
  • Решено
    Решено

    The Ceph instability has been resolved. Caeph Tier2 shares, VDI, and VMs should be back to their normal state.

    If your VM, /net/fs-[labname] share, or VDI session is still impacted, please contact rchelp@rc.fas.harvard.edu

  • Определено
    Определено

    The infrastructure behind Tier2 Ceph shares and VMs is unstable.
    This also affects VDI/OOD which relies on virtual machines.

    /net/fs-[labname] shares, new OOD/VDI sessions, and VMs are affected and may will be inaccessible until this is resolved.

    Thanks for your patience.

авг. 2023

holylabs inaccessible - cluster paused
  • Решено
    Решено

    We have reverted the routing systems involved in causing this issue to their previous version/state. We will continue to investigate why this issue occurred and what will allow us to upgrade these systems at a later date.

  • Изучается
    Изучается

    The issue with holylabs (and potentially other lustre filesystems) has recurred. This may have effects on jobs and any process using this and potentially other lustre filesystems.

    No ETA at this time.

  • Решено
    Решено

    We have restored access to holylabs and the cluster/jobs are no longer paused.

    We have identified a root cause which we will be working to remediate to prevent this issue in future.

  • Определено
    Определено

    The scheduler and all jobs have been paused in order to reduce the load on holylabs.

    We are continuing to work on a fix for this incident.

  • Изучается
    Изучается

    The holylabs filesystem is currently down due to high load.

    OOD, software, and modules are all functional but if your workflow uses holylabs for storage, scripts, or jobs it may hang or fail. Our engineers are investigating this issue further.

Ceph instability - Affects Boston VMs (Virtual Machines) and Tier2 Ceph shares
  • Решено
    Решено

    The Ceph instability has been resolved. Caeph Tier2 shares, VDI, and VMs should be back to their normal state.

    If your VM, /net/fs-[labname] share, or VDI session is still impacted, please contact rchelp@rc.fas.harvard.edu

  • Определено
    Определено

    The infrastructure behind Tier2 Ceph shares and VMs is unstable.
    This also affects VDI/OOD which relies on virtual machines.

    /net/fs-[labname] shares, new OOD/VDI sessions, and VMs are affected and may will be inaccessible until this is resolved.

    Thanks for your patience.

FASRC Monthly maintenance August 7, 2023 7am-1pm *NOTE EXTENDED TIME*
  • Завершено
    августа 07, 2023 в 13:42
    Завершено
    августа 07, 2023 в 13:42

    Due to a vendor error we were unable to complete holyscratch01 disk shelf replacement. We will work with the vendor to reschedule.

    All other maintenance tasks have completed.

  • В ходе выполнения
    августа 07, 2023 в 11:00
    В ходе выполнения
    августа 07, 2023 в 11:00

    Maintenance is now in progress

  • Еще не началось
    августа 07, 2023 в 11:00
    Еще не началось
    августа 07, 2023 в 11:00

    August maintenance will run August 7, 2023 from 7am-1pm.

    Please note the extended timeframe.
    See tasks section below for explanation.

    NOTICES

    • CentOS 7 Support EOL: We will be dropping support for CentOS 7 support in September. If your machine or VM is CentOS 7 and connects with Slurm please contact FASRC to discuss options.

    • Test Partition Changes: We are changing test partitions based on changing needs and increasing max time to 12hrs instead of 8 hrs. A reminder that this partition is not for running jobs.

    MAINTENANCE TASKS

    • holyscratch01 Disk Shelf Replacement  All Jobs Will Be Paused
      -- Audience : All cluster and scratch users - Cannon and FASSE
      -- Impact:  Hardware issues with holyscratch01 necessitate the replacement of one of the disk shelves. As a result all jobs and scratch will need to be paused for the duration.
      -- ETA: This swap is expected to take 3-4 hours, but pausing the cluster, vendor interactions, and allowing a margin for over-run requires that we extend maintenance by 2 hours  (7am-1pm)

    • Login node and OOD/VDI reboots
      -- Audience: Anyone logged into a a login node or VDI/OOD node
      -- Impact: Login and VDI/OOD nodes will rebooted during this maintenance window  

    • Scratch cleanup ( https://docs.rc.fas.harvard.edu/kb/policy-scratch/ )
      -- Audience: Cluster users
      -- Impact: Files older than 90 days will be removed.

    Thanks,
    FAS Research Computing
    Department and Service Catalog: https://www.rc.fas.harvard.edu/
    Documentation: https://docs.rc.fas.harvard.edu/
    Status Page: https://status.rc.fas.harvard.edu/

Ceph instability - Affects Boston VMs (Virtual Machines) and Tier2 Ceph shares
  • Решено
    Решено

    The Ceph instability has been resolved. Caeph Tier2 shares, VDI, and VMs should be back to their normal state.

    If your VM, /net/fs-[labname] share, or VDI session is still impacted, please contact rchelp@rc.fas.harvard.edu

  • Определено
    Определено

    The infrastructure behind Tier2 Ceph shares and VMs is unstable.
    This also affects VDI/OOD which relies on virtual machines.

    /net/fs-[labname] shares, new OOD/VDI sessions, and VMs are affected and may will be inaccessible until this is resolved.

    Thanks for your patience.

июл. 2023

июл. 2023 до сент. 2023

Следующая