Some systems are experiencing issues

About This Site

GETTING HELP

https://docs.rc.fas.harvard.edu | https://portal.rc.fas.harvard.edu/rcrt/submit_ticket | Email: rchelp@rc.fas.harvard.edu


Status page for the Harvard FAS Research Computing cluster and other resources.

Please scroll down to see details on any Incidents or maintenance notices.

Stickied Incidents

13th January 2021

Samba Cluster SAMBA Cluster Performance Issues

Hello,

FASRC has identified an issue with our SAMBA cluster. This is affecting multiple users and shares that utilize mounted storage from different types of hosts.

Currently users may experience issues when trying to access currently mounted shares or when trying to re-mount network shares. This includes an error stating that a domain controller is not available.

Keep in mind there are other ways to access this information, I would recommend Filezilla as an alternative. More information here: https://docs.rc.fas.harvard.edu/kb/sftp-file-transfer/

We identify that this may disrupt current workflows and FASRC engineers are working towards a solution but there is no ETA. Feel free to take a look at our status page here: https://status.rc.fas.harvard.edu/

Thank you,

FASRC

  • We are still troubleshooting this issue. Staff are working the issue no ETA for having the whole cluster working at this time. Users should be able to connect, but it is less resilient than intended. Thanks for your patience.

  • Updates pending...

  • Maintenance
    Monthly maintenance March 1st, 2021 7am-11am

    NOTICES

    SLURM INTERACTIVE SESSIONS: IMPORTANT Please share amongst your peers, especially if your department/group has internal documentation on interactive sessions.

    • SchedMD is recommending that you use salloc instead of srun to start an interactive session.

    • Doing this also solves the issue of invoking MPI inside of srun, which is an srun inside of an srun. If you simply use salloc to get the reservation, you can then use srun to hook into that reservation if/as needed.

    • srun is still available, but it is no longer recommended as the command to start interactive sessions.

    • See examples in our docs: https://docs.rc.fas.harvard.edu/kb/running-jobs/#Interactive_jobs_and_salloc SchedMD command documentation: https://slurm.schedmd.com/salloc.html

    • New training sessions starting in April: https://www.rc.fas.harvard.edu/upcoming-training/

    • Please note that our office hours (Wednesdays 12pm -3pm) have permanently moved online. Details: https://www.rc.fas.harvard.edu/training/office-hours/

    GENERAL MAINTENANCE

    • Slurm scheduler upgrade 20.11.4 -- Audience: Cluster users -- Impact: Minor update. The scheduler will be paused during upgrade

    • Login/VDI node reboots -- Audience: Anyone logged into a a login node or VDI/OOD node -- Impact: Login and VDI/OOD nodes will be unavailable while rebooting

    • Scratch cleanup ( https://docs.rc.fas.harvard.edu/kb/policy-scratch/ ) -- Audience: Cluster users -- Impact: Files older than 90 days will be removed.

    Reminder: Scratch 90-day file retention purging runs occur regularly not just during maintenance periods.

    Thanks,

    FAS Research Computing
    https://www.rc.fas.harvard.edu
    https://docs.rc.fas.harvard.edu
    https://status.rc.fas.harvard.edu

    Reminder: Scratch 90-day file retention purging runs occur regularly not just during maintenance periods.

    Past Incidents

    23rd February 2021

    FASRC Ticket System Ticket system down

    Our ticketing system is down as of 11:50am. We are working on bringing it back up.

    If you do not receive a confirmation message that we have received your ticket, you may need to resubmit it once it is operational again.

  • Ticketing system is up as of 12:15

  • 18th February 2021

    BosLFS boslfs/boslfs02 hanging on login nodes

    boslfs and boslfs02 are currently hanging on login nodes and on Globus. Performance on compute nodes is fine. If you need to do work on boslfs or boslfs02, please open an interactive session.

  • A networking issue was identified and corrected. Access to these filesystems is restored on login nodes and globus.

  • For issues not shown here, please contact FASRC via
    https://portal.rc.fas.harvard.edu or email rchelp@rc.fas.harvard.edu