Notice history

Operational

Nov 2024

Resolved
November 27, 2024 at 4:05 PM
Resolved
November 27, 2024 at 4:05 PM
holyscratch01 is operational. These failing object stores will continue to cause issues. Please move your scratch data to /n/netscratch if you have not already. Holyscratch01 will be set READ-ONLY on Dec. 2nd. https://www.rc.fas.harvard.edu/blog/announcing-netscratch/
Investigating
November 27, 2024 at 2:53 PM
Investigating
November 27, 2024 at 2:53 PM
OST2b has wedged again. We are in the process of troubleshooting this incident.

Resolved
November 25, 2024 at 5:37 PM
Resolved
November 25, 2024 at 5:37 PM
holylfs06 issues resolved.
Identified
November 25, 2024 at 4:31 PM
Identified
November 25, 2024 at 4:31 PM
We are working on an issue with holylfs06 Expect some instability or slowness.

Resolved
November 20, 2024 at 5:08 PM
Resolved
November 20, 2024 at 5:08 PM
Citrix/RCapps is up and working again.
Identified
November 20, 2024 at 3:09 PM
Identified
November 20, 2024 at 3:09 PM
rcapps.rc.fas.harvard.edu Citrix is down. We are investigating.

Resolved
December 05, 2024 at 9:38 PM
Resolved
December 05, 2024 at 9:38 PM
The transfer nodes for Boston globus are working again. We will follow up with the vendor for any additional information on root cause.
Thanks you for your patience.
Identified
November 19, 2024 at 8:30 PM
Identified
November 19, 2024 at 8:30 PM
This issue has not yet been resolved. We have a ticket in with Globus and will update once we know more.
Investigating
November 19, 2024 at 2:29 PM
Investigating
November 19, 2024 at 2:29 PM
The Boston collection appears unavailable in Globus. We are investigating.

Resolved
November 19, 2024 at 4:24 PM
Resolved
November 19, 2024 at 4:24 PM
Netscratch is working normally. The vendor has collected logs and is determining next steps for prevention.
Identified
November 19, 2024 at 2:52 PM
Identified
November 19, 2024 at 2:52 PM
We are seeing a recurrence of this issue. We are working to determine the cause and resolution.
Resolved
November 18, 2024 at 3:12 PM
Resolved
November 18, 2024 at 3:12 PM
The instability has been identified and resolved. Netscratch has returned to normal, full operaiton.
Investigating
November 18, 2024 at 3:05 PM
Investigating
November 18, 2024 at 3:05 PM
We have reports that some directories in netscratch are unavailable or intermittent. We are working with Vast to determine the cause. This is a new filesystem which is under heavy load for cluster use as well as migration from holyscratch01. We appreciate your understanding of these teething pains and will update you as soon as we have more details or resolution.

Oct 2024

Resolved
October 28, 2024 at 12:40 PM
Resolved
October 28, 2024 at 12:40 PM
This incident has been resolved.
Investigating
October 28, 2024 at 12:36 PM
Investigating
October 28, 2024 at 12:36 PM
OST2b (one of the bricks that make up holyscratch01) has wedged again. We are in the process of troubleshooting.

Resolved
October 27, 2024 at 6:20 PM
Resolved
October 27, 2024 at 6:20 PM
This incident has been resolved.
Investigating
October 27, 2024 at 6:15 PM
Investigating
October 27, 2024 at 6:15 PM
OST2b on holyscratch01 (one of the bricks that makes up holyscratch01) is hung up. We are failing it over the rectify scratch performance issues.

Resolved
October 25, 2024 at 8:18 PM
Resolved
October 25, 2024 at 8:18 PM
The instability has been resolved.
Identified
October 25, 2024 at 7:40 PM
Identified
October 25, 2024 at 7:40 PM
We are continuing to work on a fix for this incident.
Resolved
October 25, 2024 at 6:35 PM
Resolved
October 25, 2024 at 6:35 PM
This incident has been resolved.
Investigating
October 25, 2024 at 6:10 PM
Investigating
October 25, 2024 at 6:10 PM
We are currently investigating this incident.

Resolved
October 10, 2024 at 2:21 PM
Resolved
October 10, 2024 at 2:21 PM
This incident has been resolved.
Investigating
October 10, 2024 at 2:11 PM
Investigating
October 10, 2024 at 2:11 PM
We are rebooting holyscratch01 to clear a stuck state.

Resolved
October 09, 2024 at 7:31 PM
Resolved
October 09, 2024 at 7:31 PM
OST2b failed over
Monitoring
October 09, 2024 at 7:30 PM
Monitoring
October 09, 2024 at 7:30 PM
holyscratch01 - OST2b failing over

Sep 2024

Resolved
September 25, 2024 at 2:18 PM
Resolved
September 25, 2024 at 2:18 PM
Resolving. The hypervisor and all but one VM, which has separate issue, are operational.
Update
September 25, 2024 at 1:41 AM
Update
September 25, 2024 at 1:41 AM
FASSE Open OnDemand and FASSE login services should be operational now.
Monitoring
September 24, 2024 at 9:09 PM
Monitoring
September 24, 2024 at 9:09 PM
FASSE OOD is back up
FASSE login nodes are still down
Identified
September 24, 2024 at 7:59 PM
Identified
September 24, 2024 at 7:59 PM
One of the hypervisors managing virtual machines is down. We are working to bring it back up. This does affect FASSE login and FASSE OOD nodes as well as may degrade OpenAuth (two-factor).
Affected hosts are:
HOST -- STATUS
dataverse-backup UNKNOWN
demo2-l3-fs UNKNOWN
enos-vote-l3-fs UNKNOWN
fasselogin01 UNKNOWN
fasselogin02 UNKNOWN
frontier-squid02 UNKNOWN
frontier-squid03 UNKNOWN
frontier-squid04 UNKNOWN
goel-adm24-l3-fs UNKNOWN
goel-blind-l3-fs UNKNOWN
goel-l3-fs UNKNOWN
h-dev-fasseooda-01 UNKNOWN
h-dev-fasseooda-lb01 UNKNOWN
h-dev-fasseoodb-lb11 UNKNOWN
h-fasseooda-01 UNKNOWN
h-fasseooda-lb02 UNKNOWN
h-fasseoodb-lb11 UNKNOWN
h-fasseoodb-lb12 UNKNOWN
h-fasseoodc-lb21 UNKNOWN
h-fasseoodc-lb22 UNKNOWN
h-qa-fasseooda-01 UNKNOWN
h-qa-fasseooda-lb02 UNKNOWN
holy-es-master01 UNKNOWN
holy-es-master02 UNKNOWN
holy-es-master03 UNKNOWN
holynagios UNKNOWN
kreindlerl3-fs UNKNOWN
martin-su-l3-fs UNKNOWN
mcconnell-l3-fs UNKNOWN
openauth02 jtriley UNKNOWN
shleifer-dsl3-fs UNKNOWN
stock-solar-l3-fs UNKNOWN
stopsack-l3-fs UNKNOWN
xcat UNKNOWN

Resolved
September 23, 2024 at 8:00 PM
Resolved
September 23, 2024 at 8:00 PM
The failover is complete.
Identified
September 23, 2024 at 7:54 PM
Identified
September 23, 2024 at 7:54 PM
The object storage target OST2b on holyscratch01 is again causing degraded performance. We are failing it over to the backup. We're aware that this issue is a concern, but please know that an entire new scratch filesystem is forthcoming. Thanks for your understanding.

Resolved
September 22, 2024 at 1:00 AM
Resolved
September 22, 2024 at 1:00 AM
holyscratch01 was at times degraded over the weekend. The OST causing the issues was restarted and the filesystem should be back to normal.

Resolved
September 19, 2024 at 1:25 PM
Resolved
September 19, 2024 at 1:25 PM
holyscratch01 was found to be in a degraded state around 9:15AM and returned to operation at 9:25AM

Update
September 19, 2024 at 1:45 PM
Update
September 19, 2024 at 1:45 PM
This incident has been resolved.
Resolved
September 19, 2024 at 1:38 PM
Resolved
September 19, 2024 at 1:38 PM
holyscratch01 and affected nodes are reopened
Identified
September 19, 2024 at 9:18 AM
Identified
September 19, 2024 at 9:18 AM
holyscratch01 is seeing degraded performance and many nodes are closed off.

Sep 2024 to Nov 2024

FAS Research Computing - Notice history

Notice history

Nov 2024

Oct 2024

Sep 2024