UPDATE: A confluence of regal being unresponsive and a large array cancellation made Slurm very slow/unresponsive. This also put some nodes in drain state. Cleanup is generally complete and any additional stuck nodes are being restarted.
The job scheduler (Slurm) is experiencing performance issues. We are investigating. updates to follow.