Notishistorik

I drift

apr. 2022

Löst
april 19, 2022 kl 15:56
Löst
april 19, 2022 kl 15:56
boslogin04 is back up
Identifierat
april 19, 2022 kl 15:45
Identifierat
april 19, 2022 kl 15:45
boslogin04 needs to be rebooted in order to address an underlying issue.

Löst
april 25, 2022 kl 14:15
Löst
april 25, 2022 kl 14:15
Please contact us if you see any lingering node issues.
Identifierat
april 14, 2022 kl 12:00
Identifierat
april 14, 2022 kl 12:00
From the MGHPCC datacenter outage on 4/12/22, there are some lingering filesystem mount issues for some labs on some nodes. We are actively working on draining and rebooting these nodes to bring them back into service.

Löst
april 12, 2022 kl 22:33
Löst
april 12, 2022 kl 22:33
All cooling, including the water cooling for water-cooled compute nodes, is back online. All partitions are open for jobs. Some compute nodes in various partitions may still require individual attention, so not every compute node is back online, but we will work to bring them all online in the coming hours.
Monitorerar
april 12, 2022 kl 21:35
Monitorerar
april 12, 2022 kl 21:35
Most storage in Holyoke is back up.

The Slurm scheduler is back up and accepting jobs. However, most public partitions are down as the water cooling systems for those compute racks require in-person attention. RC staff are already en route to the datacenter to address this.

The Academic Cluster is back up.
Undersöker
april 12, 2022 kl 19:13
Undersöker
april 12, 2022 kl 19:13
A cooling failure caused temperatures in the MGHPCC datacenter to exceed the safe range of operation for many systems, causing them to power down to prevent permanent damage.

The cooling issue has been resolved and we are beginning to power systems back on. Expect outage on various systems until the issue is resolved.
Monitorerar
april 12, 2022 kl 19:13
Monitorerar
april 12, 2022 kl 19:13
A cooling failure caused temperatures in the MGHPCC datacenter to exceed the safe range of operation for many systems, causing them to power down to prevent permanent damage.

The cooling issue has been resolved and we are beginning to power systems back on. Expect outage on various systems until the issue is resolved.

Löst
april 09, 2022 kl 10:00
Löst
april 09, 2022 kl 10:00
Websites & Tools - MiniLIMs (FAS Informatics) is now operational! This update was created by an automated monitoring service.
Undersöker
april 09, 2022 kl 09:17
Undersöker
april 09, 2022 kl 09:17
Websites & Tools - MiniLIMs (FAS Informatics) cannot be accessed at the moment. This incident was created by an automated monitoring service.

Löst
april 09, 2022 kl 10:04
Löst
april 09, 2022 kl 10:04
Websites & Tools - SPINAL (FAS Informatics) is now operational! This update was created by an automated monitoring service.
Undersöker
april 09, 2022 kl 08:37
Undersöker
april 09, 2022 kl 08:37
Websites & Tools - SPINAL (FAS Informatics) cannot be accessed at the moment. This incident was created by an automated monitoring service.

mars 2022

Löst
mars 23, 2022 kl 03:12
Löst
mars 23, 2022 kl 03:12
Websites & Tools - MiniLIMs (FAS Informatics) is now operational! This update was created by an automated monitoring service.
Undersöker
mars 23, 2022 kl 02:59
Undersöker
mars 23, 2022 kl 02:59
Websites & Tools - MiniLIMs (FAS Informatics) cannot be accessed at the moment. This incident was created by an automated monitoring service.

Löst
mars 16, 2022 kl 16:04
Löst
mars 16, 2022 kl 16:04
The long path has returned to operation and initial checks indicate no substantial job or cluster issues as a result.

The short path remains down, so latency between Boston and Holyoke will still be higher than normal. See short path incident for updates as that progresses (no ETA at this time)
Identifierat
mars 16, 2022 kl 15:31
Identifierat
mars 16, 2022 kl 15:31
The short path fibre connection to Holyoke/MGHPCC suffered a failure late yesterday (3/15/22). The short path is still down as of this morning.

However, we have just seen the long path also drop connection and are awaiting more information on this incident. It is our hope this is a minor disconnect related to their repair work.

Löst
mars 16, 2022 kl 17:44
Löst
mars 16, 2022 kl 17:44
The short path repair is complete. Normal network operation and speed/latency between Boston and Holyoke have been restored.
Identifierat
mars 16, 2022 kl 16:31
Identifierat
mars 16, 2022 kl 16:31
Additional note: The long path was briefly down, but was quickly restored. No knock-on effects observed and long path is fully operational.

No ETA on short path repair.
Undersöker
mars 15, 2022 kl 21:33
Undersöker
mars 15, 2022 kl 21:33
We've been informed that the 'short path' fibre connection to Holyoke is down. No ETA.

The 'long path' remains in operation, but this means slightly more latency when performing operations between Boston and Holyoke.

We will relay updates when we receive them.

Löst
mars 15, 2022 kl 05:23
Löst
mars 15, 2022 kl 05:23
Websites & Tools - MiniLIMs (FAS Informatics) is now operational! This update was created by an automated monitoring service.
Undersöker
mars 15, 2022 kl 04:55
Undersöker
mars 15, 2022 kl 04:55
Websites & Tools - MiniLIMs (FAS Informatics) cannot be accessed at the moment. This incident was created by an automated monitoring service.

Löst
mars 14, 2022 kl 16:29
Löst
mars 14, 2022 kl 16:29
A fix was implemented earlier and after monitoring this is deemed resolved.
Undersöker
mars 14, 2022 kl 14:12
Undersöker
mars 14, 2022 kl 14:12
We are currently working on this issue and expect the ticket system back in operation shortly.

New tickets and replies should be queued and will get processed as soon as the system is back to normal.

feb. 2022

Löst
februari 25, 2022 kl 13:46
Löst
februari 25, 2022 kl 13:46
Websites & Tools - MiniLIMs (FAS Informatics) is now operational! This update was created by an automated monitoring service.
Undersöker
februari 25, 2022 kl 13:18
Undersöker
februari 25, 2022 kl 13:18
Websites & Tools - MiniLIMs (FAS Informatics) cannot be accessed at the moment. This incident was created by an automated monitoring service.

Löst
februari 20, 2022 kl 21:08
Löst
februari 20, 2022 kl 21:08
Websites & Tools - MiniLIMs (FAS Informatics) is now operational! This update was created by an automated monitoring service.
Undersöker
februari 20, 2022 kl 20:40
Undersöker
februari 20, 2022 kl 20:40
Websites & Tools - MiniLIMs (FAS Informatics) cannot be accessed at the moment. This incident was created by an automated monitoring service.

Löst
februari 12, 2022 kl 18:07
Löst
februari 12, 2022 kl 18:07
This issue has been resolved.
Identifierat
februari 12, 2022 kl 17:32
Identifierat
februari 12, 2022 kl 17:32
The home directory storage for the academic cluster has quickly filled to capacity. We are working to mitigate this issue.

Löst
februari 11, 2022 kl 16:24
Löst
februari 11, 2022 kl 16:24
Websites & Tools - Portal (portal.rc) is now operational! This update was created by an automated monitoring service.
Undersöker
februari 11, 2022 kl 16:14
Undersöker
februari 11, 2022 kl 16:14
Websites & Tools - Portal (portal.rc) cannot be accessed at the moment. This incident was created by an automated monitoring service.

Löst
februari 08, 2022 kl 18:33
Löst
februari 08, 2022 kl 18:33
This issue has been resolved and holystore01 is back to normal.
Monitorerar
februari 07, 2022 kl 21:30
Monitorerar
februari 07, 2022 kl 21:30
boslogin nodes are up but are unable to mount holystore01 at this time. Replacement parts for holystore01 will be installed at the earliest possible time to rectify this issue.
Identifierat
februari 07, 2022 kl 20:42
Identifierat
februari 07, 2022 kl 20:42
holystore01 may be inaccessible on some cluster nodes.
Undersöker
februari 07, 2022 kl 17:28
Undersöker
februari 07, 2022 kl 17:28
holystore01 is not currently accessible from boslogin nodes. These nodes will be rebooted again at 2:30pm to fix.

Please login to holylogin nodes if you need to access holystore01.

feb. 2022 till apr. 2022

FAS Research Computing - Notishistorik

Notishistorik

apr. 2022

mars 2022

feb. 2022