Ilmoitushistoria

Toiminnassa

huhti 2026

OpenOnDemand maintenance

Valmistunut
huhtikuuta 30, 2026 klo 14.00UTC
Valmistunut
huhtikuuta 30, 2026 klo 14.00UTC
Maintenance has completed successfully
Meneillään
huhtikuuta 30, 2026 klo 12.00UTC
Meneillään
huhtikuuta 30, 2026 klo 12.00UTC
Maintenance is now in progress
Suunniteltu
huhtikuuta 30, 2026 klo 12.00UTC
Suunniteltu
huhtikuuta 30, 2026 klo 12.00UTC
At 8am on Thursday April 30th we will be upgrading from Open OnDemand version 4.0.7 to 4.1.4 on both the Cannon and FASSE clusters.
This is not expected to impact running jobs.
This upgrade adds the Jobs->Project Manager menu item and fixes an issue that affected access to the Clusters->Shell Access menu item when using Firefox.

Ratkaistu
huhtikuuta 30, 2026 klo 15.43UTC
Ratkaistu
huhtikuuta 30, 2026 klo 15.43UTC
\[login.rc.fas.harvard.edu\](http://login.rc.fas.harvard.edu) is responding normally. This incident was automatically resolved.
Tutkitaan
huhtikuuta 30, 2026 klo 02.31UTC
Tutkitaan
huhtikuuta 30, 2026 klo 02.31UTC
login.rc.fas.harvard.edu is not responding normally. This incident was automatically created.

Ratkaistu
huhtikuuta 30, 2026 klo 18.12UTC
Ratkaistu
huhtikuuta 30, 2026 klo 18.12UTC
The cluster has been rebooted and all nodes, including login and OOD, have been patched.
The scheduler is re-opened and jobs which were preempted/requeued have priority for re-scheduling.
Some non-standard, lab-owned nodes may still require patching. The owners of these machines may be contacted about this.
Thank you for your patience. This is a global issue and is being addressed at centers everywhere.
Päivitys
huhtikuuta 30, 2026 klo 15.40UTC
Päivitys
huhtikuuta 30, 2026 klo 15.40UTC
To mitigate this exploit we will need to restart -all nodes- on the cluster.
This will begin at 1PM and run until all nodes have restarted (no ETA).
This will mean any un-finished jobs will be terminated. There is no way to avoid this.

We will then be validating the fix before re-opening the login. OOD nodes, and scheduler.
Next steps and updates will be posted here.
Päivitys
huhtikuuta 30, 2026 klo 14.44UTC
Päivitys
huhtikuuta 30, 2026 klo 14.44UTC
We are developing a plan of attack to mitigate this exploit. Please know that this is a very serious issue and so we are treating it as such. Thank you for your understanding.

We are currently awaiting further information from the Redhat/Fedora/Rocky community but building a plan in the meantime with the information we have. More details to follow as we can share them.

If you need to access storage (except scratch and home directories), Globus is still online and available. But again, login nodes and OOD are not available.
Tunnistettu
huhtikuuta 30, 2026 klo 02.11UTC
Tunnistettu
huhtikuuta 30, 2026 klo 02.11UTC
Due to a serious in-the-wild exploit which can compromise Fedora-based Linux distributions including Rocky, which is used on the cluster, we need to restrict access. All login and OOD nodes are shut down until a fix can be put in place. Jobs running on the cluster will continue running.
No ETA, There is not fix at this time. We will update our status page in the morning once we have more information or a fix to roll out.
This is a serious exploit and we do not take this measure lightly. Please follow this status page for updates and eventual resolution.

Ratkaistu
huhtikuuta 29, 2026 klo 16.32UTC
Ratkaistu
huhtikuuta 29, 2026 klo 16.32UTC
Holylfs06 is accessible again.
This incident has been resolved.
Tunnistettu
huhtikuuta 29, 2026 klo 16.00UTC
Tunnistettu
huhtikuuta 29, 2026 klo 16.00UTC
Holylfs06 storage is down. We are investigating. More details as they are known.

Website security maintenance (www.rc and docs.rc) 4-28-26 1pm

Valmistunut
huhtikuuta 28, 2026 klo 17.16UTC
Valmistunut
huhtikuuta 28, 2026 klo 17.16UTC
Website maintenance has completed successfully.
Meneillään
huhtikuuta 28, 2026 klo 17.00UTC
Meneillään
huhtikuuta 28, 2026 klo 17.00UTC
Maintenance is now in progress
Suunniteltu
huhtikuuta 28, 2026 klo 17.00UTC
Suunniteltu
huhtikuuta 28, 2026 klo 17.00UTC
Security updates are required for www.rc.fas.harvard.edu and docs.rc.fas.harvard.edu
This work will take place today between 1pm and 2pm
Both sites will be down for very short periods during the updates.

maalis 2026

Ratkaistu
huhtikuuta 01, 2026 klo 12.11UTC
Ratkaistu
huhtikuuta 01, 2026 klo 12.11UTC
This incident has been resolved. The scheduler is running normally.
Tutkitaan
maaliskuuta 31, 2026 klo 21.15UTC
Tutkitaan
maaliskuuta 31, 2026 klo 21.15UTC
The scheduler is in a degraded state due to thrashing
We are actively working to resolve this problem.

Ratkaistu
maaliskuuta 31, 2026 klo 16.44UTC
Ratkaistu
maaliskuuta 31, 2026 klo 16.44UTC
This incident has been resolved. two-factor.rc.fas.harvard.edu is working normally again.
Tutkitaan
maaliskuuta 31, 2026 klo 15.32UTC
Tutkitaan
maaliskuuta 31, 2026 klo 15.32UTC
We are currently investigating this incident. Requesting a new token or re-requesting your token from two-factor is not currently working.

Ratkaistu
maaliskuuta 31, 2026 klo 15.00UTC
Ratkaistu
maaliskuuta 31, 2026 klo 15.00UTC
This incident has been resolved.
Tutkitaan
maaliskuuta 25, 2026 klo 14.30UTC
Tutkitaan
maaliskuuta 25, 2026 klo 14.30UTC
We are currently investigating this incident.

Ratkaistu
maaliskuuta 30, 2026 klo 20.41UTC
Ratkaistu
maaliskuuta 30, 2026 klo 20.41UTC
This incident has been resolved by draining and rebooting any nodes with stuck mounts.
Seurataan
maaliskuuta 25, 2026 klo 14.31UTC
Seurataan
maaliskuuta 25, 2026 klo 14.31UTC
Mounts to Holyoke Isilon (specifically /n/sw) are broken on numerous nodes across the cluster. We have a check rolling out to find these nodes so we can remediate them individually. Until remediated the cluster will be in a degraded state. Running jobs may randomly die or fail as they hit nodes that have stale mounts.
It will be risky to run jobs for the next hour and then, after that point, the cluster will have a large number of nodes closed waiting for them to drain so we can reboot them and fix the mounts.
At this time we are unaware of any holy-isilon problems other than the effect this had on cluster nodes/running jobs. We will update should we identify any data storage concerns.
Tunnistettu
maaliskuuta 25, 2026 klo 14.10UTC
Tunnistettu
maaliskuuta 25, 2026 klo 14.10UTC
Mounts to Holyoke Isilon (specifically /n/sw) are broken on numerous nodes across the cluster. We have a check rolling out to find these nodes so we can remediate them individually. Until remediated the cluster will be in a degraded state. Running jobs may randomly die or fail as they hit nodes that have stale mounts.
It will be risky to run jobs for the next hour and then, after that point, the cluster will have a large number of nodes closed waiting for them to drain so we can reboot them and fix the mounts.
Tutkitaan
maaliskuuta 25, 2026 klo 13.34UTC
Tutkitaan
maaliskuuta 25, 2026 klo 13.34UTC
A network issue affecting storage critical to the cluster is It's causing instability. The cluster is currently in a degraded state as a result. We are looking into the problem. Updates to follow..

Ratkaistu
maaliskuuta 19, 2026 klo 14.52UTC
Ratkaistu
maaliskuuta 19, 2026 klo 14.52UTC
Cold front is back up. Thank you for your patience.
Tunnistettu
maaliskuuta 19, 2026 klo 12.58UTC
Tunnistettu
maaliskuuta 19, 2026 klo 12.58UTC
ColdFront is down. We are working to bring it back up. The instance got replaced last night, but it had trouble configuring itself on the way up again.

helmi 2026

Ratkaistu
maaliskuuta 04, 2026 klo 15.09UTC
Ratkaistu
maaliskuuta 04, 2026 klo 15.09UTC
This incident has been resolved. Normal tape operations are restored.
Seurataan
maaliskuuta 03, 2026 klo 14.04UTC
Seurataan
maaliskuuta 03, 2026 klo 14.04UTC
The tape library outage is further extended to Wednesday March 4th at 9am awaiting a hardware replacement part due today. Data can still be uploaded to lab collections via Globus, but be mindful of the 10 TB buffer file limit. The outage affects storage and recall from tape.
Tunnistettu
maaliskuuta 02, 2026 klo 14.03UTC
Tunnistettu
maaliskuuta 02, 2026 klo 14.03UTC
NESE Tape Service is still working with IBM technical support at restoring the inventory. The expected downtime is extended until Tuesday March 3rd, 9am.
Apologies for the inconvenvenience.
Tutkitaan
helmikuuta 27, 2026 klo 21.27UTC
Tutkitaan
helmikuuta 27, 2026 klo 21.27UTC
NESE Tape service will be down or operating with degraded service (no store and recall) Friday from 12 Noon EST until as late as Monday, 2 March at 9 AM.

SUMMARY OF ISSUE:

NESE Tape service is currently not able to store or recall files to and from tape due to vendor firmware issues in the IBM TS4500 tape library. The issue is related to the library robotics and cartridge database and we do NOT expect any data loss from this issue.

The issue is apparently due to an issue with the inventory database related to a recent firmware update. This database can be scrubbed and reconstructed by the library, which will scan the bar code labels on all the cartridges to rebuild the inventory. Association of files in Globus to tapes is handled separately from the tape library and is not affected by the firmware update.

Ratkaistu
helmikuuta 27, 2026 klo 22.04UTC
Ratkaistu
helmikuuta 27, 2026 klo 22.04UTC
This incident has been resolved. The Starfish dashboard is available.
Tutkitaan
helmikuuta 26, 2026 klo 14.13UTC
Tutkitaan
helmikuuta 26, 2026 klo 14.13UTC
The starfish dashboard is unavailable. We are currently investigating this issue with Starfish..

Starfish maintenance Feb 25, 2026 all day

Valmistunut
helmikuuta 26, 2026 klo 14.00UTC
Valmistunut
helmikuuta 26, 2026 klo 14.00UTC
Maintenance has completed successfully
Meneillään
helmikuuta 25, 2026 klo 14.00UTC
Meneillään
helmikuuta 25, 2026 klo 14.00UTC
Maintenance is now in progress
Suunniteltu
helmikuuta 25, 2026 klo 14.00UTC
Suunniteltu
helmikuuta 25, 2026 klo 14.00UTC
Starfish will be unavailable starting Wednesday, February 25th at 9AM until Thursday, February 26th at 9AM, for routine maintenance. The online dashboard will be inaccessible during this time.

Ratkaistu
helmikuuta 24, 2026 klo 15.44UTC
Ratkaistu
helmikuuta 24, 2026 klo 15.44UTC
Openauth/radius is now operational. This update was created by an automated monitoring service.
Tutkitaan
helmikuuta 24, 2026 klo 15.39UTC
Tutkitaan
helmikuuta 24, 2026 klo 15.39UTC
Authentication issues with openauth/radius. This incident was created by an automated monitoring service.

NESE tape maintenance Feb 19th 2026

Valmistunut
helmikuuta 19, 2026 klo 22.00UTC
Valmistunut
helmikuuta 19, 2026 klo 22.00UTC
Maintenance has completed successfully
Meneillään
helmikuuta 19, 2026 klo 13.00UTC
Meneillään
helmikuuta 19, 2026 klo 13.00UTC
Maintenance is now in progress
Suunniteltu
helmikuuta 19, 2026 klo 13.00UTC
Suunniteltu
helmikuuta 19, 2026 klo 13.00UTC
From our partners at NESE. Details follow:
We are installing four new tape frames, which will bring the tape system raw storage capacity to 253 petabytes.
Service Affected: NESE Tape Service
Maintenance Window: 8:00 AM - 5:00 PM (EST)
- The tape service will be unavailable.
- All upgrade activities are expected to be completed on the same day.
NOTES:
- Monitor the MGHPCC Slack #nese channel for status updates and announcements
- Monitor https://nese.instatus.com/ for real-time updates on progress
Subscribe to https://nese.instatus.com/subscribe/email for updates and announcements

helmi 2026 to huhti 2026

FAS Research Computing - Ilmoitushistoria

Ilmoitushistoria

huhti 2026

maalis 2026

helmi 2026