All Cluster nodes down or draining

업데이트

해결됨
2월 08, 2024 ~에서 20:56UTC
해결됨
2월 08, 2024 ~에서 20:56UTC
Clusters are back in production status. We will continue to monitor for any aberrant behavior, but this incident has been resolved.
확인됨
2월 08, 2024 ~에서 20:07UTC
확인됨
2월 08, 2024 ~에서 20:07UTC
I spoke too soon, partial recovery, I'll update here when we are sure everything is back in production, apologies.
모니터링 중
2월 08, 2024 ~에서 19:48UTC
모니터링 중
2월 08, 2024 ~에서 19:48UTC
All cannon nodes are back in service and slurm is resuming jobs, fasse is coming back up as well, we will monitor the situation, but anticipate a return to full normal operations shortly.
업데이트
2월 08, 2024 ~에서 19:35UTC
업데이트
2월 08, 2024 ~에서 19:35UTC
The nodes are coming back into normal service, we anticipate this to be fairly quick
확인됨
2월 08, 2024 ~에서 19:20UTC
확인됨
2월 08, 2024 ~에서 19:20UTC
slurm is operational, jobs are idled and should resume as normal
조사 중
2월 08, 2024 ~에서 19:18UTC
조사 중
2월 08, 2024 ~에서 19:18UTC
We are currently investigating this incident. We will update here as we can

FAS Research Computing - All Cluster nodes down or draining – 사건 세부 정보