नोटिस इतिहास

चालू

अक्टू 2023

हल हुआ
अक्टूबर 27, 2023 पर 12:00 पूर्वाह्नUTC
हल हुआ
अक्टूबर 27, 2023 पर 12:00 पूर्वाह्नUTC
A VPN certificate for vpn.rc.fas.harvard.edu expired, which led to "Untrusted Server Blocked" messages when attempting to connect. A new certificate has already been added, and you should no longer be getting any errors when connecting to the VPN.

This incident has been resolved.

हल हुआ
अक्टूबर 16, 2023 पर 2:16 अपराह्नUTC
हल हुआ
अक्टूबर 16, 2023 पर 2:16 अपराह्नUTC
FASSE login and OOD have been returned to service.
निगरानी
अक्टूबर 15, 2023 पर 11:11 अपराह्नUTC
निगरानी
अक्टूबर 15, 2023 पर 11:11 अपराह्नUTC
Most resources are once again available. The Cannon (including Kempner), FASSE, and Academic cluster are open for jobs. Please note that FASSE login and OpenOnDemand (OOD) nodes are not yet available. ETA Monday morning.

Thanks for your patience through this unexpected event.
अपडेट
अक्टूबर 15, 2023 पर 7:17 अपराह्नUTC
अपडेट
अक्टूबर 15, 2023 पर 7:17 अपराह्नUTC
Power-up is progressing with, so far, only minor issues which we are addressing according to their impact on returning the cluster to service.

Expect some remaining effects on less-essential services into tomorrow.

Please note that login nodes will remain down until we return the cluster and scheduler to service.
अपडेट
अक्टूबर 15, 2023 पर 3:02 अपराह्नUTC
अपडेट
अक्टूबर 15, 2023 पर 3:02 अपराह्नUTC
MGHPCC has isolated the cause of the generator failure and will continue to look into the grid failure.

At this time they will begin re-energizing the facility. Once that is complete and we have confirmed the networking is stable we can begin powering up our resources.

Please bear with us as this is a long process given the number of systems we maintain and it must be done in stages. Watch this page for updates.
पहचाना गया
अक्टूबर 14, 2023 पर 10:47 अपराह्नUTC
पहचाना गया
अक्टूबर 14, 2023 पर 10:47 अपराह्नUTC
With an abundance of caution FASRC and other MGHPCC occupants will not attempt to rush to restoration but will wait until the facility has restored primary power and confirmed stable operation before attempting to resume normal operations.

As such, we expect to begin restoring FASRC services tomorrow (Sunday). Since all Holyoke services and resources are down, this is a lengthy process similar to the startup process after the annual power-down.

Updates will be posted here. Please consider subscribing to our status page (see 'Get Updates' up top).
जांच जारी है
अक्टूबर 14, 2023 पर 9:17 अपराह्नUTC
जांच जारी है
अक्टूबर 14, 2023 पर 9:17 अपराह्नUTC
There has been a major power event at MGHPCC, our Holyoke data center.
We are awaiting further details

This likely affects all holyoke resources including the cluster and storage housed in holyoke.

More details as we learn them.

हल हुआ
अक्टूबर 12, 2023 पर 4:26 अपराह्नUTC
हल हुआ
अक्टूबर 12, 2023 पर 4:26 अपराह्नUTC
The security patch has been applied, and all clusters are accepting jobs at this time.
जांच जारी है
अक्टूबर 12, 2023 पर 3:02 अपराह्नUTC
जांच जारी है
अक्टूबर 12, 2023 पर 3:02 अपराह्नUTC
SchedMD (the maintainers of Slurm) have discovered a critical security flaw in Slurm. Due to the nature and severity of the issue, we will be immediately applying this patch.

Cannon and FASSE schedulers will remain down for the duration of the patching. All running jobs will be paused, and new jobs will not be accepted until the scheduler is back up.

ETA is expected to be approximately one hour.

हल हुआ
अक्टूबर 03, 2023 पर 6:43 अपराह्नUTC
हल हुआ
अक्टूबर 03, 2023 पर 6:43 अपराह्नUTC
This incident has been resolved.
जांच जारी है
अक्टूबर 03, 2023 पर 6:26 अपराह्नUTC
जांच जारी है
अक्टूबर 03, 2023 पर 6:26 अपराह्नUTC
We are currently investigating this incident.

FASRC monthly maintenance Monday October 2nd, 2023 7am-11am

पूर्ण
अक्टूबर 02, 2023 पर 3:00 अपराह्नUTC
पूर्ण
अक्टूबर 02, 2023 पर 3:00 अपराह्नUTC
Maintenance has completed successfully
प्रगति पर
अक्टूबर 02, 2023 पर 11:00 पूर्वाह्नUTC
प्रगति पर
अक्टूबर 02, 2023 पर 11:00 पूर्वाह्नUTC
Maintenance is now in progress
नियोजित
अक्टूबर 02, 2023 पर 11:00 पूर्वाह्नUTC
नियोजित
अक्टूबर 02, 2023 पर 11:00 पूर्वाह्नUTC
FASRC monthly maintenance will take place Monday October 2nd, 2023 from 7am-11am

NOTICES

New training sessions are available. Topics include New User Training, Getting Started on FASRC with CLI, Getting Started on FASRC with OpenOnDemand, GPU Computing, Parallel Job Workflows, and Singularity. To see current and uture training sessions, see our calendar at: https://www.rc.fas.harvard.edu/upcoming-training/

MAINTENANCE TASKS

Cannon cluster will be paused during this maintenance?: Yes
FASSE cluster will be paused during this maintenance?: No

Cannon UFM updates
-- Audience: Cluster users
-- Impact: The cluster will be paused while this update takes place.

Login node and OOD/VDI reboots
-- Audience: Anyone logged into a login node or VDI/OOD node
-- Impact: Login and VDI/OOD nodes will rebooted during this maintenance window

Scratch cleanup ( https://docs.rc.fas.harvard.edu/kb/policy-scratch/ )
-- Audience: Cluster users
-- Impact: Files older than 90 days will be removed. Please note that retention cleanup can run at any time, not just during the maintenance window.

Thanks,
FAS Research Computing
Department and Service Catalog: https://www.rc.fas.harvard.edu/
Documentation: https://docs.rc.fas.harvard.edu/
Status Page: https://status.rc.fas.harvard.edu/

सित 2023

हल हुआ
सितंबर 29, 2023 पर 4:53 अपराह्नUTC
हल हुआ
सितंबर 29, 2023 पर 4:53 अपराह्नUTC
The Ceph instability has been resolved. Caeph Tier2 shares, VDI, and VMs should be back to their normal state.

If your VM, /net/fs-[labname] share, or VDI session is still impacted, please contact rchelp@rc.fas.harvard.edu
पहचाना गया
सितंबर 29, 2023 पर 3:16 अपराह्नUTC
पहचाना गया
सितंबर 29, 2023 पर 3:16 अपराह्नUTC
The infrastructure behind Tier2 Ceph shares and VMs is unstable.
This also affects VDI/OOD which relies on virtual machines.

/net/fs-[labname] shares, new OOD/VDI sessions, and VMs are affected and may will be inaccessible until this is resolved.

Thanks for your patience.

हल हुआ
सितंबर 25, 2023 पर 11:36 पूर्वाह्नUTC
हल हुआ
सितंबर 25, 2023 पर 11:36 पूर्वाह्नUTC
The Ceph instability has been resolved. Caeph Tier2 shares, VDI, and VMs should be back to their normal state.

If your VM, /net/fs-[labname] share, or VDI session is still impacted, please contact rchelp@rc.fas.harvard.edu
पहचाना गया
सितंबर 25, 2023 पर 5:19 पूर्वाह्नUTC
पहचाना गया
सितंबर 25, 2023 पर 5:19 पूर्वाह्नUTC
The infrastructure behind Tier2 Ceph shares and VMs is unstable.
This also affects VDI/OOD which relies on virtual machines.

/net/fs-[labname] shares, new OOD/VDI sessions, and VMs are affected and may will be inaccessible until this is resolved.

Thanks for your patience.

हल हुआ
सितंबर 25, 2023 पर 12:53 पूर्वाह्नUTC
हल हुआ
सितंबर 25, 2023 पर 12:53 पूर्वाह्नUTC
The Ceph instability has been resolved. Caeph Tier2 shares, VDI, and VMs should be back to their normal state.

If your VM, /net/fs-[labname] share, or VDI session is still impacted, please contact rchelp@rc.fas.harvard.edu
पहचाना गया
सितंबर 24, 2023 पर 8:49 अपराह्नUTC
पहचाना गया
सितंबर 24, 2023 पर 8:49 अपराह्नUTC
The infrastructure behind Tier2 Ceph shares and VMs is unstable.
This also affects VDI/OOD which relies on virtual machines.

/net/fs-[labname] shares, new OOD/VDI sessions, and VMs are affected and may will be inaccessible until this is resolved.

Thanks for your patience.

हल हुआ
सितंबर 23, 2023 पर 1:12 अपराह्नUTC
हल हुआ
सितंबर 23, 2023 पर 1:12 अपराह्नUTC
The Ceph instability has been resolved. Caeph Tier2 shares, VDI, and VMs should be back to their normal state.

If your VM, /net/fs-[labname] share, or VDI session is still impacted, please contact rchelp@rc.fas.harvard.edu
पहचाना गया
सितंबर 23, 2023 पर 5:31 पूर्वाह्नUTC
पहचाना गया
सितंबर 23, 2023 पर 5:31 पूर्वाह्नUTC
The infrastructure behind Tier2 Ceph shares and VMs is unstable.
This also affects VDI/OOD which relies on virtual machines.

/net/fs-[labname] shares, new OOD/VDI sessions, and VMs are affected and may will be inaccessible until this is resolved.

Thanks for your patience.

हल हुआ
सितंबर 22, 2023 पर 2:31 अपराह्नUTC
हल हुआ
सितंबर 22, 2023 पर 2:31 अपराह्नUTC
The Ceph instability has been resolved. Caeph Tier2 shares, VDI, and VMs should be back to their normal state.

If your VM, /net/fs-[labname] share, or VDI session is still impacted, please contact rchelp@rc.fas.harvard.edu
पहचाना गया
सितंबर 22, 2023 पर 12:28 अपराह्नUTC
पहचाना गया
सितंबर 22, 2023 पर 12:28 अपराह्नUTC
The infrastructure behind Tier2 Ceph shares and VMs is unstable.
This also affects VDI/OOD which relies on virtual machines.

/net/fs-[labname] shares, new OOD/VDI sessions, and VMs are affected and may will be inaccessible until this is resolved.

Thanks for your patience.

अग 2023

हल हुआ
सितंबर 01, 2023 पर 4:02 अपराह्नUTC
हल हुआ
सितंबर 01, 2023 पर 4:02 अपराह्नUTC
We have reverted the routing systems involved in causing this issue to their previous version/state. We will continue to investigate why this issue occurred and what will allow us to upgrade these systems at a later date.
जांच जारी है
सितंबर 01, 2023 पर 3:56 पूर्वाह्नUTC
जांच जारी है
सितंबर 01, 2023 पर 3:56 पूर्वाह्नUTC
The issue with holylabs (and potentially other lustre filesystems) has recurred. This may have effects on jobs and any process using this and potentially other lustre filesystems.

No ETA at this time.
हल हुआ
सितंबर 01, 2023 पर 12:59 पूर्वाह्नUTC
हल हुआ
सितंबर 01, 2023 पर 12:59 पूर्वाह्नUTC
We have restored access to holylabs and the cluster/jobs are no longer paused.

We have identified a root cause which we will be working to remediate to prevent this issue in future.
पहचाना गया
अगस्त 31, 2023 पर 11:30 अपराह्नUTC
पहचाना गया
अगस्त 31, 2023 पर 11:30 अपराह्नUTC
The scheduler and all jobs have been paused in order to reduce the load on holylabs.

We are continuing to work on a fix for this incident.
जांच जारी है
अगस्त 31, 2023 पर 6:00 अपराह्नUTC
जांच जारी है
अगस्त 31, 2023 पर 6:00 अपराह्नUTC
The holylabs filesystem is currently down due to high load.

OOD, software, and modules are all functional but if your workflow uses holylabs for storage, scripts, or jobs it may hang or fail. Our engineers are investigating this issue further.

हल हुआ
अगस्त 26, 2023 पर 1:59 अपराह्नUTC
हल हुआ
अगस्त 26, 2023 पर 1:59 अपराह्नUTC
The Ceph instability has been resolved. Caeph Tier2 shares, VDI, and VMs should be back to their normal state.

If your VM, /net/fs-[labname] share, or VDI session is still impacted, please contact rchelp@rc.fas.harvard.edu
पहचाना गया
अगस्त 26, 2023 पर 2:02 पूर्वाह्नUTC
पहचाना गया
अगस्त 26, 2023 पर 2:02 पूर्वाह्नUTC
The infrastructure behind Tier2 Ceph shares and VMs is unstable.
This also affects VDI/OOD which relies on virtual machines.

/net/fs-[labname] shares, new OOD/VDI sessions, and VMs are affected and may will be inaccessible until this is resolved.

Thanks for your patience.

हल हुआ
अगस्त 14, 2023 पर 2:56 अपराह्नUTC
हल हुआ
अगस्त 14, 2023 पर 2:56 अपराह्नUTC
This incident has been resolved.
जांच जारी है
अगस्त 11, 2023 पर 3:48 अपराह्नUTC
जांच जारी है
अगस्त 11, 2023 पर 3:48 अपराह्नUTC
holyscratch01 is in a degraded state currently. A group of improperly architected jobs are hammering the filesystem which is impeding access for other users. We are in the process of identifying and stopping these jobs.

Until then, we recommend starting an interactive session and working from there as those will have the lowest impact. However, performance will still be slow until we are able to stop the problematic jobs.

FASRC Monthly maintenance August 7, 2023 7am-1pm *NOTE EXTENDED TIME*

पूर्ण
अगस्त 07, 2023 पर 1:42 अपराह्नUTC
पूर्ण
अगस्त 07, 2023 पर 1:42 अपराह्नUTC
Due to a vendor error we were unable to complete holyscratch01 disk shelf replacement. We will work with the vendor to reschedule.

All other maintenance tasks have completed.
प्रगति पर
अगस्त 07, 2023 पर 11:00 पूर्वाह्नUTC
प्रगति पर
अगस्त 07, 2023 पर 11:00 पूर्वाह्नUTC
Maintenance is now in progress
नियोजित
अगस्त 07, 2023 पर 11:00 पूर्वाह्नUTC
नियोजित
अगस्त 07, 2023 पर 11:00 पूर्वाह्नUTC
August maintenance will run August 7, 2023 from 7am-1pm.

Please note the extended timeframe.
See tasks section below for explanation.

NOTICES
- CentOS 7 Support EOL: We will be dropping support for CentOS 7 support in September. If your machine or VM is CentOS 7 and connects with Slurm please contact FASRC to discuss options.
- Test Partition Changes: We are changing test partitions based on changing needs and increasing max time to 12hrs instead of 8 hrs. A reminder that this partition is not for running jobs.
MAINTENANCE TASKS
- holyscratch01 Disk Shelf Replacement All Jobs Will Be Paused
  -- Audience : All cluster and scratch users - Cannon and FASSE
  -- Impact: Hardware issues with holyscratch01 necessitate the replacement of one of the disk shelves. As a result all jobs and scratch will need to be paused for the duration.
  -- ETA: This swap is expected to take 3-4 hours, but pausing the cluster, vendor interactions, and allowing a margin for over-run requires that we extend maintenance by 2 hours (7am-1pm)
- Login node and OOD/VDI reboots
  -- Audience: Anyone logged into a a login node or VDI/OOD node
  -- Impact: Login and VDI/OOD nodes will rebooted during this maintenance window
- Scratch cleanup ( https://docs.rc.fas.harvard.edu/kb/policy-scratch/ )
  -- Audience: Cluster users
  -- Impact: Files older than 90 days will be removed.
Thanks,
FAS Research Computing
Department and Service Catalog: https://www.rc.fas.harvard.edu/
Documentation: https://docs.rc.fas.harvard.edu/
Status Page: https://status.rc.fas.harvard.edu/

हल हुआ
अगस्त 06, 2023 पर 3:05 अपराह्नUTC
हल हुआ
अगस्त 06, 2023 पर 3:05 अपराह्नUTC
The Ceph instability has been resolved. Caeph Tier2 shares, VDI, and VMs should be back to their normal state.

If your VM, /net/fs-[labname] share, or VDI session is still impacted, please contact rchelp@rc.fas.harvard.edu
पहचाना गया
अगस्त 04, 2023 पर 2:56 अपराह्नUTC
पहचाना गया
अगस्त 04, 2023 पर 2:56 अपराह्नUTC
The infrastructure behind Tier2 Ceph shares and VMs is unstable.
This also affects VDI/OOD which relies on virtual machines.

/net/fs-[labname] shares, new OOD/VDI sessions, and VMs are affected and may will be inaccessible until this is resolved.

Thanks for your patience.

अग 2023 तक अक्टू 2023

FAS Research Computing - नोटिस इतिहास

नोटिस इतिहास

अक्टू 2023

सित 2023

अग 2023