Historia powiadomień

Poprawne działanie

kwi 2026

OpenOnDemand maintenance

Zakończono
kwietnia 30, 2026 o 14:00UTC
Zakończono
kwietnia 30, 2026 o 14:00UTC
Maintenance has completed successfully
W trakcie
kwietnia 30, 2026 o 12:00UTC
W trakcie
kwietnia 30, 2026 o 12:00UTC
Maintenance is now in progress
Planowane
kwietnia 30, 2026 o 12:00UTC
Planowane
kwietnia 30, 2026 o 12:00UTC
At 8am on Thursday April 30th we will be upgrading from Open OnDemand version 4.0.7 to 4.1.4 on both the Cannon and FASSE clusters.
This is not expected to impact running jobs.
This upgrade adds the Jobs->Project Manager menu item and fixes an issue that affected access to the Clusters->Shell Access menu item when using Firefox.

Rozwiązany
kwietnia 30, 2026 o 15:43UTC
Rozwiązany
kwietnia 30, 2026 o 15:43UTC
\[login.rc.fas.harvard.edu\](http://login.rc.fas.harvard.edu) is responding normally. This incident was automatically resolved.
Analiza
kwietnia 30, 2026 o 02:31UTC
Analiza
kwietnia 30, 2026 o 02:31UTC
login.rc.fas.harvard.edu is not responding normally. This incident was automatically created.

Rozwiązany
kwietnia 30, 2026 o 18:12UTC
Rozwiązany
kwietnia 30, 2026 o 18:12UTC
The cluster has been rebooted and all nodes, including login and OOD, have been patched.
The scheduler is re-opened and jobs which were preempted/requeued have priority for re-scheduling.
Some non-standard, lab-owned nodes may still require patching. The owners of these machines may be contacted about this.
Thank you for your patience. This is a global issue and is being addressed at centers everywhere.
Aktualizacja
kwietnia 30, 2026 o 15:40UTC
Aktualizacja
kwietnia 30, 2026 o 15:40UTC
To mitigate this exploit we will need to restart -all nodes- on the cluster.
This will begin at 1PM and run until all nodes have restarted (no ETA).
This will mean any un-finished jobs will be terminated. There is no way to avoid this.

We will then be validating the fix before re-opening the login. OOD nodes, and scheduler.
Next steps and updates will be posted here.
Aktualizacja
kwietnia 30, 2026 o 14:44UTC
Aktualizacja
kwietnia 30, 2026 o 14:44UTC
We are developing a plan of attack to mitigate this exploit. Please know that this is a very serious issue and so we are treating it as such. Thank you for your understanding.

We are currently awaiting further information from the Redhat/Fedora/Rocky community but building a plan in the meantime with the information we have. More details to follow as we can share them.

If you need to access storage (except scratch and home directories), Globus is still online and available. But again, login nodes and OOD are not available.
Zidentyfikowany
kwietnia 30, 2026 o 02:11UTC
Zidentyfikowany
kwietnia 30, 2026 o 02:11UTC
Due to a serious in-the-wild exploit which can compromise Fedora-based Linux distributions including Rocky, which is used on the cluster, we need to restrict access. All login and OOD nodes are shut down until a fix can be put in place. Jobs running on the cluster will continue running.
No ETA, There is not fix at this time. We will update our status page in the morning once we have more information or a fix to roll out.
This is a serious exploit and we do not take this measure lightly. Please follow this status page for updates and eventual resolution.

Rozwiązany
kwietnia 29, 2026 o 16:32UTC
Rozwiązany
kwietnia 29, 2026 o 16:32UTC
Holylfs06 is accessible again.
This incident has been resolved.
Zidentyfikowany
kwietnia 29, 2026 o 16:00UTC
Zidentyfikowany
kwietnia 29, 2026 o 16:00UTC
Holylfs06 storage is down. We are investigating. More details as they are known.

Website security maintenance (www.rc and docs.rc) 4-28-26 1pm

Zakończono
kwietnia 28, 2026 o 17:16UTC
Zakończono
kwietnia 28, 2026 o 17:16UTC
Website maintenance has completed successfully.
W trakcie
kwietnia 28, 2026 o 17:00UTC
W trakcie
kwietnia 28, 2026 o 17:00UTC
Maintenance is now in progress
Planowane
kwietnia 28, 2026 o 17:00UTC
Planowane
kwietnia 28, 2026 o 17:00UTC
Security updates are required for www.rc.fas.harvard.edu and docs.rc.fas.harvard.edu
This work will take place today between 1pm and 2pm
Both sites will be down for very short periods during the updates.

mar 2026

Rozwiązany
kwietnia 01, 2026 o 12:11UTC
Rozwiązany
kwietnia 01, 2026 o 12:11UTC
This incident has been resolved. The scheduler is running normally.
Analiza
marca 31, 2026 o 21:15UTC
Analiza
marca 31, 2026 o 21:15UTC
The scheduler is in a degraded state due to thrashing
We are actively working to resolve this problem.

Rozwiązany
marca 31, 2026 o 16:44UTC
Rozwiązany
marca 31, 2026 o 16:44UTC
This incident has been resolved. two-factor.rc.fas.harvard.edu is working normally again.
Analiza
marca 31, 2026 o 15:32UTC
Analiza
marca 31, 2026 o 15:32UTC
We are currently investigating this incident. Requesting a new token or re-requesting your token from two-factor is not currently working.

Rozwiązany
marca 31, 2026 o 15:00UTC
Rozwiązany
marca 31, 2026 o 15:00UTC
This incident has been resolved.
Analiza
marca 25, 2026 o 14:30UTC
Analiza
marca 25, 2026 o 14:30UTC
We are currently investigating this incident.

Rozwiązany
marca 30, 2026 o 20:41UTC
Rozwiązany
marca 30, 2026 o 20:41UTC
This incident has been resolved by draining and rebooting any nodes with stuck mounts.
Monitorowanie
marca 25, 2026 o 14:31UTC
Monitorowanie
marca 25, 2026 o 14:31UTC
Mounts to Holyoke Isilon (specifically /n/sw) are broken on numerous nodes across the cluster. We have a check rolling out to find these nodes so we can remediate them individually. Until remediated the cluster will be in a degraded state. Running jobs may randomly die or fail as they hit nodes that have stale mounts.
It will be risky to run jobs for the next hour and then, after that point, the cluster will have a large number of nodes closed waiting for them to drain so we can reboot them and fix the mounts.
At this time we are unaware of any holy-isilon problems other than the effect this had on cluster nodes/running jobs. We will update should we identify any data storage concerns.
Zidentyfikowany
marca 25, 2026 o 14:10UTC
Zidentyfikowany
marca 25, 2026 o 14:10UTC
Mounts to Holyoke Isilon (specifically /n/sw) are broken on numerous nodes across the cluster. We have a check rolling out to find these nodes so we can remediate them individually. Until remediated the cluster will be in a degraded state. Running jobs may randomly die or fail as they hit nodes that have stale mounts.
It will be risky to run jobs for the next hour and then, after that point, the cluster will have a large number of nodes closed waiting for them to drain so we can reboot them and fix the mounts.
Analiza
marca 25, 2026 o 13:34UTC
Analiza
marca 25, 2026 o 13:34UTC
A network issue affecting storage critical to the cluster is It's causing instability. The cluster is currently in a degraded state as a result. We are looking into the problem. Updates to follow..

Rozwiązany
marca 19, 2026 o 14:52UTC
Rozwiązany
marca 19, 2026 o 14:52UTC
Cold front is back up. Thank you for your patience.
Zidentyfikowany
marca 19, 2026 o 12:58UTC
Zidentyfikowany
marca 19, 2026 o 12:58UTC
ColdFront is down. We are working to bring it back up. The instance got replaced last night, but it had trouble configuring itself on the way up again.

lut 2026

Rozwiązany
marca 04, 2026 o 15:09UTC
Rozwiązany
marca 04, 2026 o 15:09UTC
This incident has been resolved. Normal tape operations are restored.
Monitorowanie
marca 03, 2026 o 14:04UTC
Monitorowanie
marca 03, 2026 o 14:04UTC
The tape library outage is further extended to Wednesday March 4th at 9am awaiting a hardware replacement part due today. Data can still be uploaded to lab collections via Globus, but be mindful of the 10 TB buffer file limit. The outage affects storage and recall from tape.
Zidentyfikowany
marca 02, 2026 o 14:03UTC
Zidentyfikowany
marca 02, 2026 o 14:03UTC
NESE Tape Service is still working with IBM technical support at restoring the inventory. The expected downtime is extended until Tuesday March 3rd, 9am.
Apologies for the inconvenvenience.
Analiza
lutego 27, 2026 o 21:27UTC
Analiza
lutego 27, 2026 o 21:27UTC
NESE Tape service will be down or operating with degraded service (no store and recall) Friday from 12 Noon EST until as late as Monday, 2 March at 9 AM.

SUMMARY OF ISSUE:

NESE Tape service is currently not able to store or recall files to and from tape due to vendor firmware issues in the IBM TS4500 tape library. The issue is related to the library robotics and cartridge database and we do NOT expect any data loss from this issue.

The issue is apparently due to an issue with the inventory database related to a recent firmware update. This database can be scrubbed and reconstructed by the library, which will scan the bar code labels on all the cartridges to rebuild the inventory. Association of files in Globus to tapes is handled separately from the tape library and is not affected by the firmware update.

Rozwiązany
lutego 27, 2026 o 22:04UTC
Rozwiązany
lutego 27, 2026 o 22:04UTC
This incident has been resolved. The Starfish dashboard is available.
Analiza
lutego 26, 2026 o 14:13UTC
Analiza
lutego 26, 2026 o 14:13UTC
The starfish dashboard is unavailable. We are currently investigating this issue with Starfish..

Starfish maintenance Feb 25, 2026 all day

Zakończono
lutego 26, 2026 o 14:00UTC
Zakończono
lutego 26, 2026 o 14:00UTC
Maintenance has completed successfully
W trakcie
lutego 25, 2026 o 14:00UTC
W trakcie
lutego 25, 2026 o 14:00UTC
Maintenance is now in progress
Planowane
lutego 25, 2026 o 14:00UTC
Planowane
lutego 25, 2026 o 14:00UTC
Starfish will be unavailable starting Wednesday, February 25th at 9AM until Thursday, February 26th at 9AM, for routine maintenance. The online dashboard will be inaccessible during this time.

Rozwiązany
lutego 24, 2026 o 15:44UTC
Rozwiązany
lutego 24, 2026 o 15:44UTC
Openauth/radius is now operational. This update was created by an automated monitoring service.
Analiza
lutego 24, 2026 o 15:39UTC
Analiza
lutego 24, 2026 o 15:39UTC
Authentication issues with openauth/radius. This incident was created by an automated monitoring service.

NESE tape maintenance Feb 19th 2026

Zakończono
lutego 19, 2026 o 22:00UTC
Zakończono
lutego 19, 2026 o 22:00UTC
Maintenance has completed successfully
W trakcie
lutego 19, 2026 o 13:00UTC
W trakcie
lutego 19, 2026 o 13:00UTC
Maintenance is now in progress
Planowane
lutego 19, 2026 o 13:00UTC
Planowane
lutego 19, 2026 o 13:00UTC
From our partners at NESE. Details follow:
We are installing four new tape frames, which will bring the tape system raw storage capacity to 253 petabytes.
Service Affected: NESE Tape Service
Maintenance Window: 8:00 AM - 5:00 PM (EST)
- The tape service will be unavailable.
- All upgrade activities are expected to be completed on the same day.
NOTES:
- Monitor the MGHPCC Slack #nese channel for status updates and announcements
- Monitor https://nese.instatus.com/ for real-time updates on progress
Subscribe to https://nese.instatus.com/subscribe/email for updates and announcements

lut 2026 do kwi 2026

FAS Research Computing - Historia powiadomień

Historia powiadomień

kwi 2026

mar 2026

lut 2026