FAS Research Computing - Infiniband networking down – Incident details

Globus Data Transfer experiencing partial outage

Status page for the Harvard FAS Research Computing cluster and other resources.

Cluster Utilization (VPN and FASRC login required): Cannon | FASSE | Academic


Please scroll down to see details on any Incidents or maintenance notices.
Monthly maintenance occurs on the first Monday of the month (except holidays).

GETTING HELP
https://docs.rc.fas.harvard.edu | https://portal.rc.fas.harvard.edu | Email: rchelp@rc.fas.harvard.edu


The colors shown in the bars below were chosen to increase visibility for color-blind visitors.
For higher contrast, switch to light mode at the bottom of this page if the background is dark and colors are muted.

Infiniband networking down

Resolved
Operational
Started 5 months agoLasted 14 minutes

Affected

Cannon Cluster

Major outage from 2:33 PM to 2:47 PM

Cannon Compute Cluster (Holyoke)

Major outage from 2:33 PM to 2:47 PM

GPU nodes (Holyoke)

Major outage from 2:33 PM to 2:47 PM

FASSE Cluster

Major outage from 2:33 PM to 2:47 PM

FASSE Compute Cluster (Holyoke)

Major outage from 2:33 PM to 2:47 PM

Kempner Cluster

Major outage from 2:33 PM to 2:47 PM

Updates
  • Update
    Update
    This incident has been resolved.
  • Resolved
    Resolved
    Infiniband has been restored
  • Identified
    Identified
    The Infiniband fabric in MGHPCC that connects nodes with high-speed fiber is down. We are investigating. This _will_ cause performance issues and jobs may be stalled. Updates as we learn more.