FAS Research Computing - Lịch sử thông báo

Hệ thống hoạt động bình thường

Status page for the Harvard FAS Research Computing cluster and other resources.

Cluster Utilization (VPN and FASRC login required): Cannon | FASSE


Please scroll down to see details on any Incidents or maintenance notices.
Monthly maintenance occurs on the first Monday of the month (except holidays).

GETTING HELP
Documentation: https://docs.rc.fas.harvard.edu | Account Portal https://portal.rc.fas.harvard.edu
Email: rchelp@rc.fas.harvard.edu | Support Hours


The colors shown in the bars below were chosen to increase visibility for color-blind visitors.
For higher contrast, switch to light mode at the bottom of this page if the background is dark and colors are muted.

Đang hoạt động

SLURM Scheduler - Cannon - Đang hoạt động

Cannon Compute Cluster (Holyoke) - Đang hoạt động

Boston Compute Nodes - Đang hoạt động

GPU nodes (Holyoke) - Đang hoạt động

seas_compute - Đang hoạt động

Đang hoạt động

SLURM Scheduler - FASSE - Đang hoạt động

FASSE Compute Cluster (Holyoke) - Đang hoạt động

Đang hoạt động

Kempner Cluster CPU - Đang hoạt động

Kempner Cluster GPU - Đang hoạt động

Đang hoạt động

FASSE login nodes - Đang hoạt động

Đang hoạt động

Cannon Open OnDemand - Đang hoạt động

FASSE Open OnDemand - Đang hoạt động

Đang hoạt động

Netscratch (Global Scratch) - Đang hoạt động

Home Directory Storage - Boston - Đang hoạt động

Tape - (Tier 3) - Đang hoạt động

Holylabs - Đang hoạt động

Isilon Storage Holyoke (Tier 1) - Đang hoạt động

Holystore01 (Tier 0) - Đang hoạt động

HolyLFS04 (Tier 0) - Đang hoạt động

HolyLFS05 (Tier 0) - Đang hoạt động

HolyLFS06 (Tier 0) - Đang hoạt động

Holyoke Tier 2 NFS (new) - Đang hoạt động

Holyoke Specialty Storage - Đang hoạt động

holECS - Đang hoạt động

Isilon Storage Boston (Tier 1) - Đang hoạt động

BosLFS02 (Tier 0) - Đang hoạt động

Boston Tier 2 NFS (new) - Đang hoạt động

CEPH Storage Boston (Tier 2) - Đang hoạt động

Boston Specialty Storage - Đang hoạt động

bosECS - Đang hoạt động

Samba Cluster - Đang hoạt động

Globus Data Transfer - Đang hoạt động

Lịch sử thông báo

thg 1 2023

Monthly Maintenance Jan. 9th, 2023 7am-11am
  • Hoàn thành
    tháng 01 10, 2023 tại 04:00
    Hoàn thành
    tháng 01 10, 2023 tại 04:00

    Maintenance has completed successfully

  • Đang tiến hành
    tháng 01 10, 2023 tại 00:00
    Đang tiến hành
    tháng 01 10, 2023 tại 00:00

    Maintenance is now in progress

  • Chưa bắt đầu
    tháng 01 10, 2023 tại 00:00
    Chưa bắt đầu
    tháng 01 10, 2023 tại 00:00

    NOTICES

    GPU PARTITIONS
    The gputest partition is back in service. Job limits are now 64 cores, 8 GPU's, and 750G of RAM. Users can run up to 2 jobs.

    GLOBUS PERSONAL CLIENT - 3.1 Client Deprecated
    If you are using the Globus Connect Personal client on your machine, please ensure you have updated and are running version 3.2 or greater. Version 3.1 and below are deprecated and will not work as of December 17th, 2022. https://docs.globus.org/ca-update-2022/#globus
    connect_personal

    HOLIDAY NOTICE
    January 16th is a university holiday (MLK Day)

    GENERAL MAINTENANCE

    * Slurm upgrade
    Audience: Cluster users
    Impact: Jobs will be paused during upgrade

    * OnDemand Version upgrade to 2.0.29
    Audience: VDI/OpenOnDemand users
    Impact: VDI will be unavailable during this and the above Slurm upgrade

    * Domain controller updates
    Audience: All cluster
    Impact: Could briefly impact some older systems, otherwise no impact expected

    * Login node and VDI node reboots and firmware updates
    Audience: Anyone logged into a a login node or VDI/OOD node
    Impact: Login and VDI/OOD nodes will be unavailable while updating and rebooting

    * Scratch cleanup ( https://docs.rc.fas.harvard.edu/kb/policy-scratch/ )
    Audience: Cluster users
    Impact: Files older than 90 days will be removed.

    Reminder: Scratch 90-day file retention purging runs occur regularly not just during maintenance periods.

    Thanks!
    FAS Research Computing
    Department and Service Catalog: https://www.rc.fas.harvard.edu/
    Documentation: https://docs.rc.fas.harvard.edu/
    Status Page: https://status.rc.fas.harvard.edu/

thg 12 2022

Monthly maintenance Dec 5th 2022 7am-11am
  • Hoàn thành
    tháng 12 05, 2022 tại 19:50
    Hoàn thành
    tháng 12 05, 2022 tại 19:50

    Maintenance has completed successfully.

  • Đang tiến hành
    tháng 12 05, 2022 tại 14:55
    Đang tiến hành
    tháng 12 05, 2022 tại 14:55

    Apologies. The maintenance event on the status page did not start automatically. Maintenance is already underway and will complete at 11am.

  • Chưa bắt đầu
    tháng 12 05, 2022 tại 12:00
    Chưa bắt đầu
    tháng 12 05, 2022 tại 12:00

    NOTICES

    GPUTEST and REMOTEVIZ PARTITIONS Due to failed nodes, the gputest partition is down to 2 nodes and the (single node) remoteviz partition is down at the moment. We are working with the vendor to replace hardware, but this is still unresolved and no ETA at this time. Updates and QoS changes on our status page when we have them: https://status.rc.fas.harvard.edu/cl8a94kcf17664hvoj8oksxanx

    GLOBUS PERSONAL CLIENT - UPDATE BY DEC 17
    If you are using the Globus Connect Personal client on your machine, please ensure you have updated and are running version 3.2 or greater by December 17th, 2022. You will not be able to use version 3.1 or below after that date. https://docs.globus.org/ca-update-2022/#globusconnectpersonal

    HOLIDAY NOTICES NOVEMBER:
    Office Hours will be held on 11/23 prior to the Thanksgiving break, but will run only from 12-2pm. FASRC staff will be unavailable Nov. 16th from 12-3pm for a staff event. Thur/Fri Nov. 24th and 25th are university holidays (Thanksgiving).

    HOLIDAY NOTICES DECEMBER:
    Office Hours will not be held on Dec. 21st and will resume Jan. 4th, 2023. Winter break runs Dec. 23 - Jan. 2nd. FASRC will monitor for emergencies during this time, but general questions/tickets will be held until we return on Jan. 3rd, 2023.

    SLUMR SCHEDULER UPDATE NOTES:
    Given a bug in the previous versions of Slurm, this upgrade will create a situation where jobs launched on the previous version will get stuck in COMPLETING state until the node is rebooted (see: https://bugs.schedmd.com/show_bug.cgi?id=15078). This means in the week(s) following the upgrade there will rolling reboots of the nodes to clear these stuck jobs.

    Users should be aware that any jobs stuck in COMPLETING state will remain so until the node the job lives on is rebooted, and any node that is labelled COMPLETING will not be able to receive jobs until it is rebooted. This is due to a Slurm bug and nothing to do with the users code or jobs and thus the users cannot do anything to clear this state faster. FASRC admins will reboot nodes as soon as they are clear of work to fix this issue.

    GENERAL MAINTENANCE

    • Slurm scheduler update (22.05.x) - See notes above
      -- Audience: All cluster job users
      -- Impact: =See notes above.= The scheduler and job will be paused during upgrade.

    • Partition decommissioning
      -- Audience: narayan and holymeissner partition users
      -- Impact: This partition(s) will no longer be available

    • Domain controller DHCP updates
      -- Audience: All users
      -- Impact: No impact expected

    • Holyscratch01 firmware updates
      -- Audience: All users of scratch
      -- Impact: Scratch will be unavailable for short periods

    • Login node and VDI node reboots and firmware updates -- Audience: Anyone logged into a a login node or VDI/OOD node -- Impact: Login and VDI/OOD nodes will be unavailable while updating and rebooting

    • Scratch cleanup ( https://docs.rc.fas.harvard.edu/kb/policy-scratch/ ) -- Audience: Cluster users -- Impact: Files older than 90 days will be removed.

    Reminder: Scratch 90-day file retention purging runs occur regularly not just during maintenance periods.

    Updates on our status page: https://status.rc.fas.harvard.edu

    Thanks!
    FAS Research Computing
    Department and Service Catalog: https://www.rc.fas.harvard.edu/
    Documentation: https://docs.rc.fas.harvard.edu/
    Status Page: https://status.rc.fas.harvard.edu/

thg 11 2022

Monthly maintenance Nov 7th 2022 7am-11am
  • Hoàn thành
    tháng 11 16, 2022 tại 17:12
    Hoàn thành
    tháng 11 16, 2022 tại 17:12

    Maintenance has completed successfully.

  • Đang tiến hành
    tháng 11 07, 2022 tại 12:00
    Đang tiến hành
    tháng 11 07, 2022 tại 12:00

    NOTICES

    • GPUTEST and REMOTEVIZ PARTITIONS Due to failed nodes, the gputest partition is down to 2 nodes and the (single node) remoteviz partition is down at the moment. We are working with the vendor to replace hardware, but this is still unresolved and no ETA at this time. Updates and QoS changes on our status page when we have them:https://status.rc.fas.harvard.edu/cl8a94kcf17664hvoj8oksxanx

    • GLOBUS PERSONAL CLIENT - UPDATE BY DEC 17 If you are using the Globus Connect Personal client on your machine, please ensure you have updated and are running version 3.2 or greater by December 17th, 2022. You will not be able to use version 3.1 or below after that date. https://docs.globus.org/ca-update-2022/#globusconnectpersonal

    • TRAINING New training sessions, including monthly new user training, are available. You can find a list and links to sign up here: https://www.rc.fas.harvard.edu/upcoming-training/

    • NERC informs us that they will be performing maintenance on Nov 2nd from 8am-12pm. If you use NERC's services, you can find more information on their status page: https://nerc.instatus.com/

    FASRC REGULAR MAINTENANCE

    • Network maintenance - HIGH IMPACT POSSIBLE -- Audience: All users -- Impact: Network maintenance will take place during this time and is expected to run the full 4 hours. Work on the firewalls and fibre links will be involved, so impact may be felt across both data centers.

    • Samba admin node reboots -- Audience: All users of Samba (desktop) mounts -- Impact: Impact is expected to be transparent or minimal. Active Samba mounts could be affected but only briefly and should recover on their own.

    • Globus update -- Audience: All users of Globus -- Impact: Globus may be unavailable for short periods.

    • Domain Controller Updates -- Audience: All users -- Impact: Minimal impact is expected.

    • Login node and VDI node reboots -- Audience: Anyone logged into a a login node or VDI/OOD node -- Impact: Login and VDI/OOD nodes will be unavailable while updating and rebooting

    • Scratch cleanup ( https://docs.rc.fas.harvard.edu/kb/policy-scratch/ ) -- Audience: Cluster users -- Impact: Files older than 90 days will be removed. -- Reminder: Scratch 90-day file retention purging runs occur regularly not just during maintenance periods.

    Thanks, FAS Research Computing Department and Service Catalog: https://www.rc.fas.harvard.edu/ Documentation: https://docs.rc.fas.harvard.edu/ Status Page: https://status.rc.fas.harvard.edu/

thg 11 2022 đến thg 1 2023

Sau