Histórico de avisos

Operacional

abr 2026

OpenOnDemand maintenance

Concluído
abril 30, 2026 em 14:00UTC
Concluído
abril 30, 2026 em 14:00UTC
Maintenance has completed successfully
Em curso
abril 30, 2026 em 12:00UTC
Em curso
abril 30, 2026 em 12:00UTC
Maintenance is now in progress
Ainda não começou
abril 30, 2026 em 12:00UTC
Ainda não começou
abril 30, 2026 em 12:00UTC
At 8am on Thursday April 30th we will be upgrading from Open OnDemand version 4.0.7 to 4.1.4 on both the Cannon and FASSE clusters.
This is not expected to impact running jobs.
This upgrade adds the Jobs->Project Manager menu item and fixes an issue that affected access to the Clusters->Shell Access menu item when using Firefox.

Resolvido
abril 30, 2026 em 15:43UTC
Resolvido
abril 30, 2026 em 15:43UTC
\[login.rc.fas.harvard.edu\](http://login.rc.fas.harvard.edu) is responding normally. This incident was automatically resolved.
Investigando
abril 30, 2026 em 02:31UTC
Investigando
abril 30, 2026 em 02:31UTC
login.rc.fas.harvard.edu is not responding normally. This incident was automatically created.

Resolvido
abril 30, 2026 em 18:12UTC
Resolvido
abril 30, 2026 em 18:12UTC
The cluster has been rebooted and all nodes, including login and OOD, have been patched.
The scheduler is re-opened and jobs which were preempted/requeued have priority for re-scheduling.
Some non-standard, lab-owned nodes may still require patching. The owners of these machines may be contacted about this.
Thank you for your patience. This is a global issue and is being addressed at centers everywhere.
Atualizar
abril 30, 2026 em 15:40UTC
Atualizar
abril 30, 2026 em 15:40UTC
To mitigate this exploit we will need to restart -all nodes- on the cluster.
This will begin at 1PM and run until all nodes have restarted (no ETA).
This will mean any un-finished jobs will be terminated. There is no way to avoid this.

We will then be validating the fix before re-opening the login. OOD nodes, and scheduler.
Next steps and updates will be posted here.
Atualizar
abril 30, 2026 em 14:44UTC
Atualizar
abril 30, 2026 em 14:44UTC
We are developing a plan of attack to mitigate this exploit. Please know that this is a very serious issue and so we are treating it as such. Thank you for your understanding.

We are currently awaiting further information from the Redhat/Fedora/Rocky community but building a plan in the meantime with the information we have. More details to follow as we can share them.

If you need to access storage (except scratch and home directories), Globus is still online and available. But again, login nodes and OOD are not available.
Identificado
abril 30, 2026 em 02:11UTC
Identificado
abril 30, 2026 em 02:11UTC
Due to a serious in-the-wild exploit which can compromise Fedora-based Linux distributions including Rocky, which is used on the cluster, we need to restrict access. All login and OOD nodes are shut down until a fix can be put in place. Jobs running on the cluster will continue running.
No ETA, There is not fix at this time. We will update our status page in the morning once we have more information or a fix to roll out.
This is a serious exploit and we do not take this measure lightly. Please follow this status page for updates and eventual resolution.

Resolvido
abril 29, 2026 em 16:32UTC
Resolvido
abril 29, 2026 em 16:32UTC
Holylfs06 is accessible again.
This incident has been resolved.
Identificado
abril 29, 2026 em 16:00UTC
Identificado
abril 29, 2026 em 16:00UTC
Holylfs06 storage is down. We are investigating. More details as they are known.

Website security maintenance (www.rc and docs.rc) 4-28-26 1pm

Concluído
abril 28, 2026 em 17:16UTC
Concluído
abril 28, 2026 em 17:16UTC
Website maintenance has completed successfully.
Em curso
abril 28, 2026 em 17:00UTC
Em curso
abril 28, 2026 em 17:00UTC
Maintenance is now in progress
Ainda não começou
abril 28, 2026 em 17:00UTC
Ainda não começou
abril 28, 2026 em 17:00UTC
Security updates are required for www.rc.fas.harvard.edu and docs.rc.fas.harvard.edu
This work will take place today between 1pm and 2pm
Both sites will be down for very short periods during the updates.

mar 2026

Resolvido
abril 01, 2026 em 12:11UTC
Resolvido
abril 01, 2026 em 12:11UTC
This incident has been resolved. The scheduler is running normally.
Investigando
março 31, 2026 em 21:15UTC
Investigando
março 31, 2026 em 21:15UTC
The scheduler is in a degraded state due to thrashing
We are actively working to resolve this problem.

Resolvido
março 31, 2026 em 16:44UTC
Resolvido
março 31, 2026 em 16:44UTC
This incident has been resolved. two-factor.rc.fas.harvard.edu is working normally again.
Investigando
março 31, 2026 em 15:32UTC
Investigando
março 31, 2026 em 15:32UTC
We are currently investigating this incident. Requesting a new token or re-requesting your token from two-factor is not currently working.

Resolvido
março 31, 2026 em 15:00UTC
Resolvido
março 31, 2026 em 15:00UTC
This incident has been resolved.
Investigando
março 25, 2026 em 14:30UTC
Investigando
março 25, 2026 em 14:30UTC
We are currently investigating this incident.

Resolvido
março 30, 2026 em 20:41UTC
Resolvido
março 30, 2026 em 20:41UTC
This incident has been resolved by draining and rebooting any nodes with stuck mounts.
Monitorização
março 25, 2026 em 14:31UTC
Monitorização
março 25, 2026 em 14:31UTC
Mounts to Holyoke Isilon (specifically /n/sw) are broken on numerous nodes across the cluster. We have a check rolling out to find these nodes so we can remediate them individually. Until remediated the cluster will be in a degraded state. Running jobs may randomly die or fail as they hit nodes that have stale mounts.
It will be risky to run jobs for the next hour and then, after that point, the cluster will have a large number of nodes closed waiting for them to drain so we can reboot them and fix the mounts.
At this time we are unaware of any holy-isilon problems other than the effect this had on cluster nodes/running jobs. We will update should we identify any data storage concerns.
Identificado
março 25, 2026 em 14:10UTC
Identificado
março 25, 2026 em 14:10UTC
Mounts to Holyoke Isilon (specifically /n/sw) are broken on numerous nodes across the cluster. We have a check rolling out to find these nodes so we can remediate them individually. Until remediated the cluster will be in a degraded state. Running jobs may randomly die or fail as they hit nodes that have stale mounts.
It will be risky to run jobs for the next hour and then, after that point, the cluster will have a large number of nodes closed waiting for them to drain so we can reboot them and fix the mounts.
Investigando
março 25, 2026 em 13:34UTC
Investigando
março 25, 2026 em 13:34UTC
A network issue affecting storage critical to the cluster is It's causing instability. The cluster is currently in a degraded state as a result. We are looking into the problem. Updates to follow..

Resolvido
março 19, 2026 em 14:52UTC
Resolvido
março 19, 2026 em 14:52UTC
Cold front is back up. Thank you for your patience.
Identificado
março 19, 2026 em 12:58UTC
Identificado
março 19, 2026 em 12:58UTC
ColdFront is down. We are working to bring it back up. The instance got replaced last night, but it had trouble configuring itself on the way up again.

fev 2026

Resolvido
março 04, 2026 em 15:09UTC
Resolvido
março 04, 2026 em 15:09UTC
This incident has been resolved. Normal tape operations are restored.
Monitorização
março 03, 2026 em 14:04UTC
Monitorização
março 03, 2026 em 14:04UTC
The tape library outage is further extended to Wednesday March 4th at 9am awaiting a hardware replacement part due today. Data can still be uploaded to lab collections via Globus, but be mindful of the 10 TB buffer file limit. The outage affects storage and recall from tape.
Identificado
março 02, 2026 em 14:03UTC
Identificado
março 02, 2026 em 14:03UTC
NESE Tape Service is still working with IBM technical support at restoring the inventory. The expected downtime is extended until Tuesday March 3rd, 9am.
Apologies for the inconvenvenience.
Investigando
fevereiro 27, 2026 em 21:27UTC
Investigando
fevereiro 27, 2026 em 21:27UTC
NESE Tape service will be down or operating with degraded service (no store and recall) Friday from 12 Noon EST until as late as Monday, 2 March at 9 AM.

SUMMARY OF ISSUE:

NESE Tape service is currently not able to store or recall files to and from tape due to vendor firmware issues in the IBM TS4500 tape library. The issue is related to the library robotics and cartridge database and we do NOT expect any data loss from this issue.

The issue is apparently due to an issue with the inventory database related to a recent firmware update. This database can be scrubbed and reconstructed by the library, which will scan the bar code labels on all the cartridges to rebuild the inventory. Association of files in Globus to tapes is handled separately from the tape library and is not affected by the firmware update.

Resolvido
fevereiro 27, 2026 em 22:04UTC
Resolvido
fevereiro 27, 2026 em 22:04UTC
This incident has been resolved. The Starfish dashboard is available.
Investigando
fevereiro 26, 2026 em 14:13UTC
Investigando
fevereiro 26, 2026 em 14:13UTC
The starfish dashboard is unavailable. We are currently investigating this issue with Starfish..

Starfish maintenance Feb 25, 2026 all day

Concluído
fevereiro 26, 2026 em 14:00UTC
Concluído
fevereiro 26, 2026 em 14:00UTC
Maintenance has completed successfully
Em curso
fevereiro 25, 2026 em 14:00UTC
Em curso
fevereiro 25, 2026 em 14:00UTC
Maintenance is now in progress
Ainda não começou
fevereiro 25, 2026 em 14:00UTC
Ainda não começou
fevereiro 25, 2026 em 14:00UTC
Starfish will be unavailable starting Wednesday, February 25th at 9AM until Thursday, February 26th at 9AM, for routine maintenance. The online dashboard will be inaccessible during this time.

Resolvido
fevereiro 24, 2026 em 15:44UTC
Resolvido
fevereiro 24, 2026 em 15:44UTC
Openauth/radius is now operational. This update was created by an automated monitoring service.
Investigando
fevereiro 24, 2026 em 15:39UTC
Investigando
fevereiro 24, 2026 em 15:39UTC
Authentication issues with openauth/radius. This incident was created by an automated monitoring service.

NESE tape maintenance Feb 19th 2026

Concluído
fevereiro 19, 2026 em 22:00UTC
Concluído
fevereiro 19, 2026 em 22:00UTC
Maintenance has completed successfully
Em curso
fevereiro 19, 2026 em 13:00UTC
Em curso
fevereiro 19, 2026 em 13:00UTC
Maintenance is now in progress
Ainda não começou
fevereiro 19, 2026 em 13:00UTC
Ainda não começou
fevereiro 19, 2026 em 13:00UTC
From our partners at NESE. Details follow:
We are installing four new tape frames, which will bring the tape system raw storage capacity to 253 petabytes.
Service Affected: NESE Tape Service
Maintenance Window: 8:00 AM - 5:00 PM (EST)
- The tape service will be unavailable.
- All upgrade activities are expected to be completed on the same day.
NOTES:
- Monitor the MGHPCC Slack #nese channel for status updates and announcements
- Monitor https://nese.instatus.com/ for real-time updates on progress
Subscribe to https://nese.instatus.com/subscribe/email for updates and announcements

fev 2026 para abr 2026

FAS Research Computing - Histórico de avisos

Histórico de avisos

abr 2026

mar 2026

fev 2026