历史记录

运行正常

4月 2022

已解决
四月 19, 2022 在下午 3:56
已解决
四月 19, 2022 在下午 3:56
boslogin04 is back up
已确认问题
四月 19, 2022 在下午 3:45
已确认问题
四月 19, 2022 在下午 3:45
boslogin04 needs to be rebooted in order to address an underlying issue.

已解决
四月 25, 2022 在下午 2:15
已解决
四月 25, 2022 在下午 2:15
Please contact us if you see any lingering node issues.
已确认问题
四月 14, 2022 在下午 12:00
已确认问题
四月 14, 2022 在下午 12:00
From the MGHPCC datacenter outage on 4/12/22, there are some lingering filesystem mount issues for some labs on some nodes. We are actively working on draining and rebooting these nodes to bring them back into service.

已解决
四月 12, 2022 在下午 10:33
已解决
四月 12, 2022 在下午 10:33
All cooling, including the water cooling for water-cooled compute nodes, is back online. All partitions are open for jobs. Some compute nodes in various partitions may still require individual attention, so not every compute node is back online, but we will work to bring them all online in the coming hours.
持续监控中
四月 12, 2022 在下午 9:35
持续监控中
四月 12, 2022 在下午 9:35
Most storage in Holyoke is back up.

The Slurm scheduler is back up and accepting jobs. However, most public partitions are down as the water cooling systems for those compute racks require in-person attention. RC staff are already en route to the datacenter to address this.

The Academic Cluster is back up.
调查中
四月 12, 2022 在下午 7:13
调查中
四月 12, 2022 在下午 7:13
A cooling failure caused temperatures in the MGHPCC datacenter to exceed the safe range of operation for many systems, causing them to power down to prevent permanent damage.

The cooling issue has been resolved and we are beginning to power systems back on. Expect outage on various systems until the issue is resolved.
持续监控中
四月 12, 2022 在下午 7:13
持续监控中
四月 12, 2022 在下午 7:13
A cooling failure caused temperatures in the MGHPCC datacenter to exceed the safe range of operation for many systems, causing them to power down to prevent permanent damage.

The cooling issue has been resolved and we are beginning to power systems back on. Expect outage on various systems until the issue is resolved.

已解决
四月 09, 2022 在上午 10:00
已解决
四月 09, 2022 在上午 10:00
Websites & Tools - MiniLIMs (FAS Informatics) is now operational! This update was created by an automated monitoring service.
调查中
四月 09, 2022 在上午 9:17
调查中
四月 09, 2022 在上午 9:17
Websites & Tools - MiniLIMs (FAS Informatics) cannot be accessed at the moment. This incident was created by an automated monitoring service.

已解决
四月 09, 2022 在上午 10:04
已解决
四月 09, 2022 在上午 10:04
Websites & Tools - SPINAL (FAS Informatics) is now operational! This update was created by an automated monitoring service.
调查中
四月 09, 2022 在上午 8:37
调查中
四月 09, 2022 在上午 8:37
Websites & Tools - SPINAL (FAS Informatics) cannot be accessed at the moment. This incident was created by an automated monitoring service.

3月 2022

已解决
三月 23, 2022 在上午 3:12
已解决
三月 23, 2022 在上午 3:12
Websites & Tools - MiniLIMs (FAS Informatics) is now operational! This update was created by an automated monitoring service.
调查中
三月 23, 2022 在上午 2:59
调查中
三月 23, 2022 在上午 2:59
Websites & Tools - MiniLIMs (FAS Informatics) cannot be accessed at the moment. This incident was created by an automated monitoring service.

已解决
三月 16, 2022 在下午 4:04
已解决
三月 16, 2022 在下午 4:04
The long path has returned to operation and initial checks indicate no substantial job or cluster issues as a result.

The short path remains down, so latency between Boston and Holyoke will still be higher than normal. See short path incident for updates as that progresses (no ETA at this time)
已确认问题
三月 16, 2022 在下午 3:31
已确认问题
三月 16, 2022 在下午 3:31
The short path fibre connection to Holyoke/MGHPCC suffered a failure late yesterday (3/15/22). The short path is still down as of this morning.

However, we have just seen the long path also drop connection and are awaiting more information on this incident. It is our hope this is a minor disconnect related to their repair work.

已解决
三月 16, 2022 在下午 5:44
已解决
三月 16, 2022 在下午 5:44
The short path repair is complete. Normal network operation and speed/latency between Boston and Holyoke have been restored.
已确认问题
三月 16, 2022 在下午 4:31
已确认问题
三月 16, 2022 在下午 4:31
Additional note: The long path was briefly down, but was quickly restored. No knock-on effects observed and long path is fully operational.

No ETA on short path repair.
调查中
三月 15, 2022 在下午 9:33
调查中
三月 15, 2022 在下午 9:33
We've been informed that the 'short path' fibre connection to Holyoke is down. No ETA.

The 'long path' remains in operation, but this means slightly more latency when performing operations between Boston and Holyoke.

We will relay updates when we receive them.

已解决
三月 15, 2022 在上午 5:23
已解决
三月 15, 2022 在上午 5:23
Websites & Tools - MiniLIMs (FAS Informatics) is now operational! This update was created by an automated monitoring service.
调查中
三月 15, 2022 在上午 4:55
调查中
三月 15, 2022 在上午 4:55
Websites & Tools - MiniLIMs (FAS Informatics) cannot be accessed at the moment. This incident was created by an automated monitoring service.

已解决
三月 14, 2022 在下午 4:29
已解决
三月 14, 2022 在下午 4:29
A fix was implemented earlier and after monitoring this is deemed resolved.
调查中
三月 14, 2022 在下午 2:12
调查中
三月 14, 2022 在下午 2:12
We are currently working on this issue and expect the ticket system back in operation shortly.

New tickets and replies should be queued and will get processed as soon as the system is back to normal.

2月 2022

已解决
二月 25, 2022 在下午 1:46
已解决
二月 25, 2022 在下午 1:46
Websites & Tools - MiniLIMs (FAS Informatics) is now operational! This update was created by an automated monitoring service.
调查中
二月 25, 2022 在下午 1:18
调查中
二月 25, 2022 在下午 1:18
Websites & Tools - MiniLIMs (FAS Informatics) cannot be accessed at the moment. This incident was created by an automated monitoring service.

已解决
二月 20, 2022 在下午 9:08
已解决
二月 20, 2022 在下午 9:08
Websites & Tools - MiniLIMs (FAS Informatics) is now operational! This update was created by an automated monitoring service.
调查中
二月 20, 2022 在下午 8:40
调查中
二月 20, 2022 在下午 8:40
Websites & Tools - MiniLIMs (FAS Informatics) cannot be accessed at the moment. This incident was created by an automated monitoring service.

已解决
二月 12, 2022 在下午 6:07
已解决
二月 12, 2022 在下午 6:07
This issue has been resolved.
已确认问题
二月 12, 2022 在下午 5:32
已确认问题
二月 12, 2022 在下午 5:32
The home directory storage for the academic cluster has quickly filled to capacity. We are working to mitigate this issue.

已解决
二月 11, 2022 在下午 4:24
已解决
二月 11, 2022 在下午 4:24
Websites & Tools - Portal (portal.rc) is now operational! This update was created by an automated monitoring service.
调查中
二月 11, 2022 在下午 4:14
调查中
二月 11, 2022 在下午 4:14
Websites & Tools - Portal (portal.rc) cannot be accessed at the moment. This incident was created by an automated monitoring service.

已解决
二月 08, 2022 在下午 6:33
已解决
二月 08, 2022 在下午 6:33
This issue has been resolved and holystore01 is back to normal.
持续监控中
二月 07, 2022 在下午 9:30
持续监控中
二月 07, 2022 在下午 9:30
boslogin nodes are up but are unable to mount holystore01 at this time. Replacement parts for holystore01 will be installed at the earliest possible time to rectify this issue.
已确认问题
二月 07, 2022 在下午 8:42
已确认问题
二月 07, 2022 在下午 8:42
holystore01 may be inaccessible on some cluster nodes.
调查中
二月 07, 2022 在下午 5:28
调查中
二月 07, 2022 在下午 5:28
holystore01 is not currently accessible from boslogin nodes. These nodes will be rebooted again at 2:30pm to fix.

Please login to holylogin nodes if you need to access holystore01.

2月 2022 至 4月 2022

FAS Research Computing - 历史记录

历史记录

4月 2022

3月 2022

2月 2022