سجل التاريخ

أداء متدهور

جاهز للعمل

مارس 2026

تم الحل
أبريل 01, 2026 في 12:11
تم الحل
أبريل 01, 2026 في 12:11
This incident has been resolved. The scheduler is running normally.
تحقيق
مارس 31, 2026 في 21:15
تحقيق
مارس 31, 2026 في 21:15
The scheduler is in a degraded state due to thrashing
We are actively working to resolve this problem.

تم الحل
مارس 31, 2026 في 16:44
تم الحل
مارس 31, 2026 في 16:44
This incident has been resolved. two-factor.rc.fas.harvard.edu is working normally again.
تحقيق
مارس 31, 2026 في 15:32
تحقيق
مارس 31, 2026 في 15:32
We are currently investigating this incident. Requesting a new token or re-requesting your token from two-factor is not currently working.

تم الحل
مارس 31, 2026 في 15:00
تم الحل
مارس 31, 2026 في 15:00
This incident has been resolved.
تحقيق
مارس 25, 2026 في 14:30
تحقيق
مارس 25, 2026 في 14:30
We are currently investigating this incident.

تم الحل
مارس 30, 2026 في 20:41
تم الحل
مارس 30, 2026 في 20:41
This incident has been resolved by draining and rebooting any nodes with stuck mounts.
المراقبة
مارس 25, 2026 في 14:31
المراقبة
مارس 25, 2026 في 14:31
Mounts to Holyoke Isilon (specifically /n/sw) are broken on numerous nodes across the cluster. We have a check rolling out to find these nodes so we can remediate them individually. Until remediated the cluster will be in a degraded state. Running jobs may randomly die or fail as they hit nodes that have stale mounts.
It will be risky to run jobs for the next hour and then, after that point, the cluster will have a large number of nodes closed waiting for them to drain so we can reboot them and fix the mounts.
At this time we are unaware of any holy-isilon problems other than the effect this had on cluster nodes/running jobs. We will update should we identify any data storage concerns.
محدد
مارس 25, 2026 في 14:10
محدد
مارس 25, 2026 في 14:10
Mounts to Holyoke Isilon (specifically /n/sw) are broken on numerous nodes across the cluster. We have a check rolling out to find these nodes so we can remediate them individually. Until remediated the cluster will be in a degraded state. Running jobs may randomly die or fail as they hit nodes that have stale mounts.
It will be risky to run jobs for the next hour and then, after that point, the cluster will have a large number of nodes closed waiting for them to drain so we can reboot them and fix the mounts.
تحقيق
مارس 25, 2026 في 13:34
تحقيق
مارس 25, 2026 في 13:34
A network issue affecting storage critical to the cluster is It's causing instability. The cluster is currently in a degraded state as a result. We are looking into the problem. Updates to follow..

تم الحل
مارس 19, 2026 في 14:52
تم الحل
مارس 19, 2026 في 14:52
Cold front is back up. Thank you for your patience.
محدد
مارس 19, 2026 في 12:58
محدد
مارس 19, 2026 في 12:58
ColdFront is down. We are working to bring it back up. The instance got replaced last night, but it had trouble configuring itself on the way up again.

فبراير 2026

تم الحل
مارس 04, 2026 في 15:09
تم الحل
مارس 04, 2026 في 15:09
This incident has been resolved. Normal tape operations are restored.
المراقبة
مارس 03, 2026 في 14:04
المراقبة
مارس 03, 2026 في 14:04
The tape library outage is further extended to Wednesday March 4th at 9am awaiting a hardware replacement part due today. Data can still be uploaded to lab collections via Globus, but be mindful of the 10 TB buffer file limit. The outage affects storage and recall from tape.
محدد
مارس 02, 2026 في 14:03
محدد
مارس 02, 2026 في 14:03
NESE Tape Service is still working with IBM technical support at restoring the inventory. The expected downtime is extended until Tuesday March 3rd, 9am.
Apologies for the inconvenvenience.
تحقيق
فبراير 27, 2026 في 21:27
تحقيق
فبراير 27, 2026 في 21:27
NESE Tape service will be down or operating with degraded service (no store and recall) Friday from 12 Noon EST until as late as Monday, 2 March at 9 AM.

SUMMARY OF ISSUE:

NESE Tape service is currently not able to store or recall files to and from tape due to vendor firmware issues in the IBM TS4500 tape library. The issue is related to the library robotics and cartridge database and we do NOT expect any data loss from this issue.

The issue is apparently due to an issue with the inventory database related to a recent firmware update. This database can be scrubbed and reconstructed by the library, which will scan the bar code labels on all the cartridges to rebuild the inventory. Association of files in Globus to tapes is handled separately from the tape library and is not affected by the firmware update.

تم الحل
فبراير 27, 2026 في 22:04
تم الحل
فبراير 27, 2026 في 22:04
This incident has been resolved. The Starfish dashboard is available.
تحقيق
فبراير 26, 2026 في 14:13
تحقيق
فبراير 26, 2026 في 14:13
The starfish dashboard is unavailable. We are currently investigating this issue with Starfish..

Starfish maintenance Feb 25, 2026 all day

مكتمل
فبراير 26, 2026 في 14:00
مكتمل
فبراير 26, 2026 في 14:00
Maintenance has completed successfully
قيد التقدم
فبراير 25, 2026 في 14:00
قيد التقدم
فبراير 25, 2026 في 14:00
Maintenance is now in progress
مخطط
فبراير 25, 2026 في 14:00
مخطط
فبراير 25, 2026 في 14:00
Starfish will be unavailable starting Wednesday, February 25th at 9AM until Thursday, February 26th at 9AM, for routine maintenance. The online dashboard will be inaccessible during this time.

تم الحل
فبراير 24, 2026 في 15:44
تم الحل
فبراير 24, 2026 في 15:44
Openauth/radius is now operational. This update was created by an automated monitoring service.
تحقيق
فبراير 24, 2026 في 15:39
تحقيق
فبراير 24, 2026 في 15:39
Authentication issues with openauth/radius. This incident was created by an automated monitoring service.

NESE tape maintenance Feb 19th 2026

مكتمل
فبراير 19, 2026 في 22:00
مكتمل
فبراير 19, 2026 في 22:00
Maintenance has completed successfully
قيد التقدم
فبراير 19, 2026 في 13:00
قيد التقدم
فبراير 19, 2026 في 13:00
Maintenance is now in progress
مخطط
فبراير 19, 2026 في 13:00
مخطط
فبراير 19, 2026 في 13:00
From our partners at NESE. Details follow:
We are installing four new tape frames, which will bring the tape system raw storage capacity to 253 petabytes.
Service Affected: NESE Tape Service
Maintenance Window: 8:00 AM - 5:00 PM (EST)
- The tape service will be unavailable.
- All upgrade activities are expected to be completed on the same day.
NOTES:
- Monitor the MGHPCC Slack #nese channel for status updates and announcements
- Monitor https://nese.instatus.com/ for real-time updates on progress
Subscribe to https://nese.instatus.com/subscribe/email for updates and announcements

يناير 2026

تم الحل
يناير 22, 2026 في 16:04
تم الحل
يناير 22, 2026 في 16:04
Coldfront is operational. Thank you for your patience.
تحقيق
يناير 22, 2026 في 15:06
تحقيق
يناير 22, 2026 في 15:06
We are currently investigating an issues with Coldfront. No ETA.

تم الحل
يناير 21, 2026 في 17:29
تم الحل
يناير 21, 2026 في 17:29
holystore01 is back up and usable.
This incident has been resolved.
تحقيق
يناير 21, 2026 في 16:28
تحقيق
يناير 21, 2026 في 16:28
The filesystem holystore01 is experiencing a network failure and in a bad state.
Some files on holystore01 may not be accessible while this is ongoing. We are working to restore access, and apologize for the inconvenience.

تم الحل
يناير 15, 2026 في 15:46
تم الحل
يناير 15, 2026 في 15:46
is back up. This incident was automatically resolved by Instatus monitoring.
تحقيق
يناير 15, 2026 في 15:34
تحقيق
يناير 15, 2026 في 15:34
is down at the moment. This incident was automatically created by Instatus monitoring.

تم الحل
يناير 13, 2026 في 15:12
تم الحل
يناير 13, 2026 في 15:12
This incident has been resolved.
محدد
يناير 13, 2026 في 14:57
محدد
يناير 13, 2026 في 14:57
holystore01 is wedging. We are rebooting.

FASRC monthly maintenance Monday January 12th, 2026 9am-1pm

مكتمل
يناير 12, 2026 في 18:00
مكتمل
يناير 12, 2026 في 18:00
Maintenance has completed successfully
قيد التقدم
يناير 12, 2026 في 14:00
قيد التقدم
يناير 12, 2026 في 14:00
Maintenance is now in progress
مخطط
يناير 12, 2026 في 14:00
مخطط
يناير 12, 2026 في 14:00
Monthly maintenance will take place on January 12th, 2026. Our maintenance tasks should be completed between 9am-1pm.
NOTICES:
- Changes to SEAS partitions, please see tasks below.
- Changes to job age priority weighting, please see tasks below.
- Status Page: You can subscribe to our status to receive notifications of maintenance, incidents, and their resolution at https://status.rc.fas.harvard.edu/ (click Get Updates for options).
- We'd love to hear success stories about your or your lab's use of FASRC. Submit your story here.
MAINTENANCE TASKS
Cannon cluster will be paused during this maintenance?: YES
FASSE cluster will be paused during this maintenance?:YES
- Slurm upgrade to 25.11.1
  Audience: All cluster users (Cannon and FASSE)
  Impact: Jobs will be paused during maintenance
- In conjunction with SEAS we will modify seas_gpu and seas_compute time limits
  Audience: SEAS users
  Impact:
  seas_gpu: will be set to 2 days maximum
  seas_compute: will be set to 3 days maximum
  Existing pending jobs longer than these limits will be set to 2 day and 3 day run times depending on partition.
- Job Age Priority Weight Change
  Audience: Cluster users
  Impact: We will be adjusting the weight applied to the priority earned by jobs by virtue of their age. Currently job priority is made up of two factors, Fairshare and Job Age. The Job Age factor is currently set such that jobs gain priority over 3 days with a maximum priority equivalent to jobs with Fairshare of 0.5. This keeps low fairshare jobs from languishing at the bottom of the queue. With the current settings though, users with low fairshare can gain significant advantage over users with higher relative fairshare. To remedy this we will be adjusting the Job Age weight to cap out at an equivalent Fairshare of 0.1. This will still allow jobs with 0 fairshare to gain priority and thus not languish while letting fairshare govern a wider range of higher priority jobs.
- Login node reboots
  Audience; All login node users
  Impact: Login nodes will reboot during the maintenance window
- Open OnDemand (OOD) node reboots
  Audienc:; All OOD users
  Impact: OOD nodes will reboot during the maintenance window
- Netscratch retention will run
  Audience: All cluster netscratch users
  Impact: Files older than 90 days will be removed. Please note that retention cleanup can and does run at any time, not just during the maintenance window.
Thank you,
FAS Research Computing
https://docs.rc.fas.harvard.edu/
https://www.rc.fas.harvard.edu/

يناير 2026 ألى مارس 2026

FAS Research Computing - سجل التاريخ

سجل التاريخ

مارس 2026

فبراير 2026

يناير 2026