FAS Research Computing - Notice history

Status page for the Harvard FAS Research Computing cluster and other resources.

Cluster Utilization (VPN and FASRC login required): Cannon | FASSE


Please scroll down to see details on any Incidents or maintenance notices.
Monthly maintenance occurs on the first Monday of the month (except holidays).

GETTING HELP
https://docs.rc.fas.harvard.edu | https://portal.rc.fas.harvard.edu | Email: rchelp@rc.fas.harvard.edu


The colors shown in the bars below were chosen to increase visibility for color-blind visitors.
For higher contrast, switch to light mode at the bottom of this page if the background is dark and colors are muted.

Operational

SLURM Scheduler - Cannon - Operational

Cannon Compute Cluster (Holyoke) - Operational

Boston Compute Nodes - Operational

GPU nodes (Holyoke) - Operational

seas_compute - Operational

Operational

SLURM Scheduler - FASSE - Operational

FASSE Compute Cluster (Holyoke) - Operational

Operational

Kempner Cluster CPU - Operational

Kempner Cluster GPU - Operational

Operational

Login Nodes - Boston - Operational

Login Nodes - Holyoke - Operational

FASSE login nodes - Operational

Operational

Cannon Open OnDemand/VDI - Operational

FASSE Open OnDemand/VDI - Operational

Operational

Netscratch (Global Scratch) - Operational

Home Directory Storage - Boston - Operational

Holylabs - Operational

HolyLFS06 (Tier 0) - Operational

HolyLFS04 (Tier 0) - Operational

HolyLFS05 (Tier 0) - Operational

Holystore01 (Tier 0) - Operational

Isilon Storage Holyoke (Tier 1) - Operational

Holyoke Tier 2 NFS (new) - Operational

100% - uptime
Jan 2023 · 100.0%Feb · 100.0%Mar · 100.0%
Jan 2023
Feb 2023
Mar 2023

Holyoke Specialty Storage - Operational

holECS - Operational

BosLFS02 (Tier 0) - Operational

Isilon Storage Boston (Tier 1) - Operational

Boston Specialty Storage - Operational

Boston Tier 2 NFS (new) - Operational

100% - uptime
Jan 2023 · 100.0%Feb · 100.0%Mar · 100.0%
Jan 2023
Feb 2023
Mar 2023

CEPH Storage Boston (Tier 2) - Operational

bosECS - Operational

Tape - (Tier 3) - Operational

Samba Cluster - Operational

Globus Data Transfer - Operational

Notice history

Mar 2023

Monthly Maintenance March 6th, 2023 7am-11am
  • Completed
    March 06, 2023 at 5:06 PM
    Completed
    March 06, 2023 at 5:06 PM

    Maintenance has completed successfully at 12:00PM.

  • In progress
    March 06, 2023 at 4:30 PM
    In progress
    March 06, 2023 at 4:30 PM

    Maintenance is still in progress as of 11:30AM as one of our vendors needs to complete some hardware work.

    Access to the Slurm scheduler and all compute are still paused, but access to storage and other services has been restored.

    We appreciate your patience.

  • Completed
    March 06, 2023 at 4:00 PM
    Completed
    March 06, 2023 at 4:00 PM

    Maintenance has completed successfully

  • In progress
    March 06, 2023 at 12:00 PM
    In progress
    March 06, 2023 at 12:00 PM

    Maintenance is now in progress

  • Planned
    March 06, 2023 at 12:00 PM
    Planned
    March 06, 2023 at 12:00 PM

    NOTICES

    The annual MGHPCC power downtime will take place June 5th-8th, 2023
    Calendar Event: https://www.rc.fas.harvard.edu/events/mghpcc-power-shutdown-2023/
    Blog Post: https://www.rc.fas.harvard.edu/blog/2023-downtime/

    GENERAL MAINTENANCE

    • NOTE: All jobs will be paused during maintenance to reduce heat load and allow data center cooling maintenance to take place.

    • Login node updates and reboots, VDI reboots
      Audience: VDI/OpenOnDemand users
      Impact: VDI will be unavailable during this and the above Slurm upgrade

    • RCSMB (samba) Boston network changes
      Audience: RCSMB shares mounted out of Boston
      Impact: Could cause brief share disconnects during updates

    • UPDATE: Nexus control plane supervisor switchover - ETA 5 minutes, short network disconnect while restarting

    • Login node updates/reboot and VDI node reboots
      Audience: Anyone logged into a a login node or VDI/OOD node
      Impact: Login and VDI/OOD nodes will be unavailable while updating and rebooting

    • Scratch cleanup ( https://docs.rc.fas.harvard.edu/kb/policy-scratch/ )
      Audience: Cluster users
      Impact: Files older than 90 days will be removed.
      Reminder: Scratch 90-day file retention purging runs occur regularly not just during maintenance periods.

    SECURITY UPDATES
    HUIT and the CIO Council have set a goal of reducing risk across all schools within the University. All schools are looking to reduce their outstanding vulnerability count 75% by June 2023. These numbers are based on HUIT security scans of our infrastructure.
    We at FAS Research Computing are responsible for thousands of physical and virtual machines. To make progress in reducing our total open vulnerabilty count, we’re going to update internal and user facing systems as part of scheduled monthly maintenance windows and on a rolling basis outside of these windows. This will generally mean running OS and security updates as needed and rebooting these nodes when required.

    This month, these hosts will get updates as part of our scheduled maintenance:
    boslogin01 - boslogin04
    holylogin01 - holylogin04
    holydtn01 - holydtn04
    xdmod4.rc.fas.harvard.edu
    rchelp.rc.fas.harvard.edu (our ticket system)
    rcsmtp.rc.fas.harvard.edu (our mail system)

    Thanks!
    FAS Research Computing
    Department and Service Catalog: https://www.rc.fas.harvard.edu/
    Documentation: https://docs.rc.fas.harvard.edu/
    Status Page: https://status.rc.fas.harvard.edu/

Feb 2023

Monthly Maintenance Feb. 6th, 2023 7am-11am
  • Completed
    February 06, 2023 at 4:00 PM
    Completed
    February 06, 2023 at 4:00 PM

    Maintenance has completed successfully

  • In progress
    February 06, 2023 at 12:00 PM
    In progress
    February 06, 2023 at 12:00 PM

    Maintenance is now in progress

  • Planned
    February 06, 2023 at 12:00 PM
    Planned
    February 06, 2023 at 12:00 PM

    NOTICES

    GPU PARTITIONS
    The gpu_test partition is back in service. Job limits are now 64 cores, 8 GPU's, and 750G of RAM. Users can run up to 2 jobs.

    HOLIDAY NOTICE
    February 20th is a university holiday (Presidents' Day)

    GENERAL MAINTENANCE

    • OnDemand Version upgrade to 2.0.29
      Audience: VDI/OpenOnDemand users
      Impact: VDI will be unavailable during this and the above Slurm upgrade

    • Domain controller updates
      Audience: All cluster
      Impact: Could briefly impact some older systems, otherwise no impact expected

    • Login node and VDI node reboots and firmware updates
      Audience: Anyone logged into a a login node or VDI/OOD node
      Impact: Login and VDI/OOD nodes will be unavailable while updating and rebooting

    • Scratch cleanup ( https://docs.rc.fas.harvard.edu/kb/policy-scratch/ )
      Audience: Cluster users
      Impact: Files older than 90 days will be removed.

    Reminder: Scratch 90-day file retention purging runs occur regularly not just during maintenance periods.

    Thanks!
    FAS Research Computing
    Department and Service Catalog: https://www.rc.fas.harvard.edu/
    Documentation: https://docs.rc.fas.harvard.edu/
    Status Page: https://status.rc.fas.harvard.edu/

Jan 2023

Monthly Maintenance Jan. 9th, 2023 7am-11am
  • Completed
    January 10, 2023 at 4:00 AM
    Completed
    January 10, 2023 at 4:00 AM

    Maintenance has completed successfully

  • In progress
    January 10, 2023 at 12:00 AM
    In progress
    January 10, 2023 at 12:00 AM

    Maintenance is now in progress

  • Planned
    January 10, 2023 at 12:00 AM
    Planned
    January 10, 2023 at 12:00 AM

    NOTICES

    GPU PARTITIONS
    The gputest partition is back in service. Job limits are now 64 cores, 8 GPU's, and 750G of RAM. Users can run up to 2 jobs.

    GLOBUS PERSONAL CLIENT - 3.1 Client Deprecated
    If you are using the Globus Connect Personal client on your machine, please ensure you have updated and are running version 3.2 or greater. Version 3.1 and below are deprecated and will not work as of December 17th, 2022. https://docs.globus.org/ca-update-2022/#globus
    connect_personal

    HOLIDAY NOTICE
    January 16th is a university holiday (MLK Day)

    GENERAL MAINTENANCE

    * Slurm upgrade
    Audience: Cluster users
    Impact: Jobs will be paused during upgrade

    * OnDemand Version upgrade to 2.0.29
    Audience: VDI/OpenOnDemand users
    Impact: VDI will be unavailable during this and the above Slurm upgrade

    * Domain controller updates
    Audience: All cluster
    Impact: Could briefly impact some older systems, otherwise no impact expected

    * Login node and VDI node reboots and firmware updates
    Audience: Anyone logged into a a login node or VDI/OOD node
    Impact: Login and VDI/OOD nodes will be unavailable while updating and rebooting

    * Scratch cleanup ( https://docs.rc.fas.harvard.edu/kb/policy-scratch/ )
    Audience: Cluster users
    Impact: Files older than 90 days will be removed.

    Reminder: Scratch 90-day file retention purging runs occur regularly not just during maintenance periods.

    Thanks!
    FAS Research Computing
    Department and Service Catalog: https://www.rc.fas.harvard.edu/
    Documentation: https://docs.rc.fas.harvard.edu/
    Status Page: https://status.rc.fas.harvard.edu/

Jan 2023 to Mar 2023

Next