FAS Research Computing - MGHPCC Pod 8A Power Upgrade June 17 will idle some Cannon nodes – Maintenance details

Status page for the Harvard FAS Research Computing cluster and other resources.
WINTER BREAK: Harvard and FASRC will be closed for winter break as of Sat. Dec 21st, 2024. We will return on Jan. 2nd, 2025. We will monitor for critical issues. All other work will be deferred until we return.

Cluster Utilization (VPN and FASRC login required): Cannon | FASSE


Please scroll down to see details on any Incidents or maintenance notices.
Monthly maintenance occurs on the first Monday of the month (except holidays).

GETTING HELP
https://docs.rc.fas.harvard.edu | https://portal.rc.fas.harvard.edu | Email: rchelp@rc.fas.harvard.edu


The colors shown in the bars below were chosen to increase visibility for color-blind visitors.
For higher contrast, switch to light mode at the bottom of this page if the background is dark and colors are muted.

MGHPCC Pod 8A Power Upgrade June 17 will idle some Cannon nodes

Completed
Scheduled for June 17, 2024 at 4:01 AM – 5:28 PM

Affects

Cannon Cluster
SLURM Scheduler - Cannon
Cannon Compute Cluster (Holyoke)
Boston Compute Nodes
GPU nodes (Holyoke)
seas_compute
Updates
  • Completed
    June 17, 2024 at 5:28 PM
    Completed
    June 17, 2024 at 5:28 PM
    Maintenance has completed successfully
  • In progress
    June 17, 2024 at 4:01 AM
    In progress
    June 17, 2024 at 4:01 AM
    Maintenance is now in progress
  • Planned
    June 17, 2024 at 4:01 AM
    Planned
    June 17, 2024 at 4:01 AM

    MGHPCC will be performing power upgrades on Pod 8A in order to increase density and allow more nodes to be added in that Pod's rows.  Similar to the May 13th work, this means that we will be idling half the nodes in 8A on two dates: June 17 and June 24th.

    These are all day events, meaning that the nodes in question will not be available for the 24 hours of that day.  This is being accomplished via reservations. So no jobs will be canceled but nodes will be drained and users may notice that their jobs may pend longer than normal as the scheduler idles these nodes.

    Where possible, please use or include other partitions in your job scripts and plan accordingly for any new or long-running jobs during that period: https://docs.rc.fas.harvard.edu/kb/running-jobs/#Slurm_partitions

    This affects the Cannon cluster. FASSE is not affected.

    Impacted partitions are:

    arguelles_delgado_gpu

    bigmem_intermediate

    bigmem

    blackhole_gpu

    eddy

    enos

    gershman gpu

    hejazi hernquist_ice

    hoekstra hsph

    huce_ice

    iaifi_gpu

    iaifi_gpu_priority

    iaifi_gpu_requeue

    intermediate

    itc_gpu

    itc_gpu_requeue

    joonholee

    jshapiro

    jshapiro_priority

    jshapiro_sapphire

    kempner

    kempner_dev

    kempner_h100

    kempner_requeue

    kempner_reservation

    kovac

    kozinsky

    kozinsky_gpu

    kozinsky_priority

    kozinsky_requeue

    murphy_ice

    ortegahernandez_ice

    sapphire

    seas_compute

    seas_gpu siag

    siag_combo

    siag_gpu

    sur test

    yao

    yao_priority

    zhuang