Notice history

Operational

Oct 2024

Resolved
October 28, 2024 at 12:40 PMUTC
Resolved
October 28, 2024 at 12:40 PMUTC
This incident has been resolved.
Investigating
October 28, 2024 at 12:36 PMUTC
Investigating
October 28, 2024 at 12:36 PMUTC
OST2b (one of the bricks that make up holyscratch01) has wedged again. We are in the process of troubleshooting.

Resolved
October 27, 2024 at 6:20 PMUTC
Resolved
October 27, 2024 at 6:20 PMUTC
This incident has been resolved.
Investigating
October 27, 2024 at 6:15 PMUTC
Investigating
October 27, 2024 at 6:15 PMUTC
OST2b on holyscratch01 (one of the bricks that makes up holyscratch01) is hung up. We are failing it over the rectify scratch performance issues.

Resolved
October 25, 2024 at 8:18 PMUTC
Resolved
October 25, 2024 at 8:18 PMUTC
The instability has been resolved.
Identified
October 25, 2024 at 7:40 PMUTC
Identified
October 25, 2024 at 7:40 PMUTC
We are continuing to work on a fix for this incident.
Resolved
October 25, 2024 at 6:35 PMUTC
Resolved
October 25, 2024 at 6:35 PMUTC
This incident has been resolved.
Investigating
October 25, 2024 at 6:10 PMUTC
Investigating
October 25, 2024 at 6:10 PMUTC
We are currently investigating this incident.

Resolved
October 10, 2024 at 2:21 PMUTC
Resolved
October 10, 2024 at 2:21 PMUTC
This incident has been resolved.
Investigating
October 10, 2024 at 2:11 PMUTC
Investigating
October 10, 2024 at 2:11 PMUTC
We are rebooting holyscratch01 to clear a stuck state.

Resolved
October 09, 2024 at 7:31 PMUTC
Resolved
October 09, 2024 at 7:31 PMUTC
OST2b failed over
Monitoring
October 09, 2024 at 7:30 PMUTC
Monitoring
October 09, 2024 at 7:30 PMUTC
holyscratch01 - OST2b failing over

Sep 2024

Resolved
September 25, 2024 at 2:18 PMUTC
Resolved
September 25, 2024 at 2:18 PMUTC
Resolving. The hypervisor and all but one VM, which has separate issue, are operational.
Update
September 25, 2024 at 1:41 AMUTC
Update
September 25, 2024 at 1:41 AMUTC
FASSE Open OnDemand and FASSE login services should be operational now.
Monitoring
September 24, 2024 at 9:09 PMUTC
Monitoring
September 24, 2024 at 9:09 PMUTC
FASSE OOD is back up
FASSE login nodes are still down
Identified
September 24, 2024 at 7:59 PMUTC
Identified
September 24, 2024 at 7:59 PMUTC
One of the hypervisors managing virtual machines is down. We are working to bring it back up. This does affect FASSE login and FASSE OOD nodes as well as may degrade OpenAuth (two-factor).
Affected hosts are:
HOST -- STATUS
dataverse-backup UNKNOWN
demo2-l3-fs UNKNOWN
enos-vote-l3-fs UNKNOWN
fasselogin01 UNKNOWN
fasselogin02 UNKNOWN
frontier-squid02 UNKNOWN
frontier-squid03 UNKNOWN
frontier-squid04 UNKNOWN
goel-adm24-l3-fs UNKNOWN
goel-blind-l3-fs UNKNOWN
goel-l3-fs UNKNOWN
h-dev-fasseooda-01 UNKNOWN
h-dev-fasseooda-lb01 UNKNOWN
h-dev-fasseoodb-lb11 UNKNOWN
h-fasseooda-01 UNKNOWN
h-fasseooda-lb02 UNKNOWN
h-fasseoodb-lb11 UNKNOWN
h-fasseoodb-lb12 UNKNOWN
h-fasseoodc-lb21 UNKNOWN
h-fasseoodc-lb22 UNKNOWN
h-qa-fasseooda-01 UNKNOWN
h-qa-fasseooda-lb02 UNKNOWN
holy-es-master01 UNKNOWN
holy-es-master02 UNKNOWN
holy-es-master03 UNKNOWN
holynagios UNKNOWN
kreindlerl3-fs UNKNOWN
martin-su-l3-fs UNKNOWN
mcconnell-l3-fs UNKNOWN
openauth02 jtriley UNKNOWN
shleifer-dsl3-fs UNKNOWN
stock-solar-l3-fs UNKNOWN
stopsack-l3-fs UNKNOWN
xcat UNKNOWN

Resolved
September 23, 2024 at 8:00 PMUTC
Resolved
September 23, 2024 at 8:00 PMUTC
The failover is complete.
Identified
September 23, 2024 at 7:54 PMUTC
Identified
September 23, 2024 at 7:54 PMUTC
The object storage target OST2b on holyscratch01 is again causing degraded performance. We are failing it over to the backup. We're aware that this issue is a concern, but please know that an entire new scratch filesystem is forthcoming. Thanks for your understanding.

Resolved
September 22, 2024 at 1:00 AMUTC
Resolved
September 22, 2024 at 1:00 AMUTC
holyscratch01 was at times degraded over the weekend. The OST causing the issues was restarted and the filesystem should be back to normal.

Resolved
September 19, 2024 at 1:25 PMUTC
Resolved
September 19, 2024 at 1:25 PMUTC
holyscratch01 was found to be in a degraded state around 9:15AM and returned to operation at 9:25AM

Update
September 19, 2024 at 1:45 PMUTC
Update
September 19, 2024 at 1:45 PMUTC
This incident has been resolved.
Resolved
September 19, 2024 at 1:38 PMUTC
Resolved
September 19, 2024 at 1:38 PMUTC
holyscratch01 and affected nodes are reopened
Identified
September 19, 2024 at 9:18 AMUTC
Identified
September 19, 2024 at 9:18 AMUTC
holyscratch01 is seeing degraded performance and many nodes are closed off.

Aug 2024

Resolved
August 29, 2024 at 4:24 PMUTC
Resolved
August 29, 2024 at 4:24 PMUTC
holylfs04 is back up
Investigating
August 29, 2024 at 4:01 PMUTC
Investigating
August 29, 2024 at 4:01 PMUTC
Holylfs04 needs to be rebooted to address performance issues.

Resolved
August 27, 2024 at 7:19 PMUTC
Resolved
August 27, 2024 at 7:19 PMUTC
The power issue has been resolved. All nodes that were drained are now being re-opened.
Investigating
August 27, 2024 at 2:22 PMUTC
Investigating
August 27, 2024 at 2:22 PMUTC
Due to temporary power availability issues in MGHPCC, half the nodes in 8A have been set to drain. This affects all partitions. Jobs will take longer to schedule but will not be terminated.
We will reopen these nodes once the power issue has been resolved at the datacenter. No ETA at this time

Resolved
August 26, 2024 at 2:29 PMUTC
Resolved
August 26, 2024 at 2:29 PMUTC
boslfs02 is back up
Investigating
August 26, 2024 at 2:20 PMUTC
Investigating
August 26, 2024 at 2:20 PMUTC
boslfs02 needs to be rebalanced. Access may be inconsistent

Starfish upgrade

Completed
August 27, 2024 at 12:00 PMUTC
Completed
August 27, 2024 at 12:00 PMUTC
Starfish is back up
Update
August 26, 2024 at 2:35 PMUTC
Update
August 26, 2024 at 2:35 PMUTC
Starfish maintenance is still ongoing, no ETA at this time.
In progress
August 24, 2024 at 12:00 AMUTC
In progress
August 24, 2024 at 12:00 AMUTC
Maintenance is now in progress
Planned
August 24, 2024 at 12:00 AMUTC
Planned
August 24, 2024 at 12:00 AMUTC
The Starfish Zones Dashboard will be undergoing a few upgrades and maintenance this weekend from Friday, August 23rd at 8AM until Monday, August 26th at 8AM. The dashboard will not be accessible during this time. Further details will be provided, if needed. Please email rchelp@rc.fas.harvard.edu if you have any questions or concerns.

Resolved
August 15, 2024 at 2:06 PMUTC
Resolved
August 15, 2024 at 2:06 PMUTC
This incident has been resolved.
Monitoring
August 15, 2024 at 1:54 PMUTC
Monitoring
August 15, 2024 at 1:54 PMUTC
We are working on rebalancing boslfs02, there may be a short period of degraded performance

Aug 2024 to Oct 2024

FAS Research Computing - Notice history

Notice history

Oct 2024

Sep 2024

Aug 2024