PLEASE SEE EMERGENCY MAINTENANCE NOTICE BELOW
Emergency maintenance Tuesday June 20th 7am-11am. Running jobs will be paused.
We are still working on finding the root cause of the scheduler slowness and timeouts. We are working with SchedMD and have sent diagnostic information for analysis.
At this time the scheduler is up but may become unresponsive to your commands at times. You can wait and retry them.
Jobs, once launched, should run as expected.
OOD/VDI are working, but job submission may also be affected.
If you receive salloc: error: Job submit/allocate failed: Invalid account or account/partition combination specified
, waiting a moment and re-submitting your job again should result in a successful submission.
Addendum: It was determined that the issue is cyclical and you will have the best luck during odd hours (e.g. - after 1pm, before 2pm) and less success interfacing with the scheduler during even hours (e.g. - after 2pm, before 3pm).
We regret the impact this is having on your work. Updates will follow as we have them.