Notice history

Operational

Degraded performance

Operational

Under maintenance

Mar 2023

Resolved
March 16, 2023 at 9:07 PM
Resolved
March 16, 2023 at 9:07 PM
The SSL cert issue has been fixed. Portal is accessible again. You may need to force-reload the page if you browser still complains (hold dorn Ctrl or Cmd while clicking reload).
Investigating
March 16, 2023 at 8:30 PM
Investigating
March 16, 2023 at 8:30 PM
Users are unable to access Portal.rc.fas.harvard.edu
We are currently investigating this problem.

Monthly Maintenance March 6th, 2023 7am-11am

Completed
March 06, 2023 at 5:06 PM
Completed
March 06, 2023 at 5:06 PM
Maintenance has completed successfully at 12:00PM.
In progress
March 06, 2023 at 4:30 PM
In progress
March 06, 2023 at 4:30 PM
Maintenance is still in progress as of 11:30AM as one of our vendors needs to complete some hardware work.

Access to the Slurm scheduler and all compute are still paused, but access to storage and other services has been restored.

We appreciate your patience.
Completed
March 06, 2023 at 4:00 PM
Completed
March 06, 2023 at 4:00 PM
Maintenance has completed successfully
In progress
March 06, 2023 at 12:00 PM
In progress
March 06, 2023 at 12:00 PM
Maintenance is now in progress
Planned
March 06, 2023 at 12:00 PM
Planned
March 06, 2023 at 12:00 PM
NOTICES

The annual MGHPCC power downtime will take place June 5th-8th, 2023
Calendar Event: https://www.rc.fas.harvard.edu/events/mghpcc-power-shutdown-2023/
Blog Post: https://www.rc.fas.harvard.edu/blog/2023-downtime/

GENERAL MAINTENANCE
- NOTE: All jobs will be paused during maintenance to reduce heat load and allow data center cooling maintenance to take place.
- Login node updates and reboots, VDI reboots
  Audience: VDI/OpenOnDemand users
  Impact: VDI will be unavailable during this and the above Slurm upgrade
- RCSMB (samba) Boston network changes
  Audience: RCSMB shares mounted out of Boston
  Impact: Could cause brief share disconnects during updates
- UPDATE: Nexus control plane supervisor switchover - ETA 5 minutes, short network disconnect while restarting
- Login node updates/reboot and VDI node reboots
  Audience: Anyone logged into a a login node or VDI/OOD node
  Impact: Login and VDI/OOD nodes will be unavailable while updating and rebooting
- Scratch cleanup ( https://docs.rc.fas.harvard.edu/kb/policy-scratch/ )
  Audience: Cluster users
  Impact: Files older than 90 days will be removed.
  Reminder: Scratch 90-day file retention purging runs occur regularly not just during maintenance periods.
SECURITY UPDATES
HUIT and the CIO Council have set a goal of reducing risk across all schools within the University. All schools are looking to reduce their outstanding vulnerability count 75% by June 2023. These numbers are based on HUIT security scans of our infrastructure.
We at FAS Research Computing are responsible for thousands of physical and virtual machines. To make progress in reducing our total open vulnerabilty count, we’re going to update internal and user facing systems as part of scheduled monthly maintenance windows and on a rolling basis outside of these windows. This will generally mean running OS and security updates as needed and rebooting these nodes when required.

This month, these hosts will get updates as part of our scheduled maintenance:
boslogin01 - boslogin04
holylogin01 - holylogin04
holydtn01 - holydtn04
xdmod4.rc.fas.harvard.edu
rchelp.rc.fas.harvard.edu (our ticket system)
rcsmtp.rc.fas.harvard.edu (our mail system)

Thanks!
FAS Research Computing
Department and Service Catalog: https://www.rc.fas.harvard.edu/
Documentation: https://docs.rc.fas.harvard.edu/
Status Page: https://status.rc.fas.harvard.edu/

Resolved
March 05, 2023 at 10:21 PM
Resolved
March 05, 2023 at 10:21 PM
Hardware maintenance was completed.
Identified
March 03, 2023 at 5:00 PM
Identified
March 03, 2023 at 5:00 PM
Due to an immediate need, we are doing some work on holylfs04 right now to replace some batteries in the controllers. Files that are on the impacted controllers will be inaccessible until they are back on line. The work should be completed later this afternoon.

Feb 2023

Resolved
February 16, 2023 at 2:44 PM
Resolved
February 16, 2023 at 2:44 PM
Normal VDI operation has resumed.
Investigating
February 16, 2023 at 2:42 PM
Investigating
February 16, 2023 at 2:42 PM
VDI is unresponsive or results in an error. We are currently investigating this issue.

Monthly Maintenance Feb. 6th, 2023 7am-11am

Completed
February 06, 2023 at 4:00 PM
Completed
February 06, 2023 at 4:00 PM
Maintenance has completed successfully
In progress
February 06, 2023 at 12:00 PM
In progress
February 06, 2023 at 12:00 PM
Maintenance is now in progress
Planned
February 06, 2023 at 12:00 PM
Planned
February 06, 2023 at 12:00 PM
NOTICES

GPU PARTITIONS
The gpu_test partition is back in service. Job limits are now 64 cores, 8 GPU's, and 750G of RAM. Users can run up to 2 jobs.

HOLIDAY NOTICE
February 20th is a university holiday (Presidents' Day)

GENERAL MAINTENANCE
- OnDemand Version upgrade to 2.0.29
  Audience: VDI/OpenOnDemand users
  Impact: VDI will be unavailable during this and the above Slurm upgrade
- Domain controller updates
  Audience: All cluster
  Impact: Could briefly impact some older systems, otherwise no impact expected
- Login node and VDI node reboots and firmware updates
  Audience: Anyone logged into a a login node or VDI/OOD node
  Impact: Login and VDI/OOD nodes will be unavailable while updating and rebooting
- Scratch cleanup ( https://docs.rc.fas.harvard.edu/kb/policy-scratch/ )
  Audience: Cluster users
  Impact: Files older than 90 days will be removed.
Reminder: Scratch 90-day file retention purging runs occur regularly not just during maintenance periods.

Thanks!
FAS Research Computing
Department and Service Catalog: https://www.rc.fas.harvard.edu/
Documentation: https://docs.rc.fas.harvard.edu/
Status Page: https://status.rc.fas.harvard.edu/

Jan 2023

Update
January 24, 2023 at 7:41 PM
Update
January 24, 2023 at 7:41 PM
The Ceph/VM cluster issues have subsided and performance should be back to normal.
Resolved
January 19, 2023 at 4:55 PM
Resolved
January 19, 2023 at 4:55 PM
The virtual infrastructure is responding normally again.
This is a known issue that we are working to eliminate, but which requires new infrastructure (and time and resources). Thanks for your patience.
Identified
January 19, 2023 at 3:00 PM
Identified
January 19, 2023 at 3:00 PM
An issue with our virtual infrastructure is affecting virtual machines and services which rely on that including Ceph storage, VDI, the FASRC ticket system, and hosted virtual machines.

Updates will be posted here.

Monthly Maintenance Jan. 9th, 2023 7am-11am

Completed
January 10, 2023 at 4:00 AM
Completed
January 10, 2023 at 4:00 AM
Maintenance has completed successfully
In progress
January 10, 2023 at 12:00 AM
In progress
January 10, 2023 at 12:00 AM
Maintenance is now in progress
Planned
January 10, 2023 at 12:00 AM
Planned
January 10, 2023 at 12:00 AM
NOTICES

GPU PARTITIONS
The gputest partition is back in service. Job limits are now 64 cores, 8 GPU's, and 750G of RAM. Users can run up to 2 jobs.

GLOBUS PERSONAL CLIENT - 3.1 Client Deprecated
If you are using the Globus Connect Personal client on your machine, please ensure you have updated and are running version 3.2 or greater. Version 3.1 and below are deprecated and will not work as of December 17th, 2022. https://docs.globus.org/ca-update-2022/#globusconnect_personal

HOLIDAY NOTICE
January 16th is a university holiday (MLK Day)

GENERAL MAINTENANCE

* Slurm upgrade
Audience: Cluster users
Impact: Jobs will be paused during upgrade

* OnDemand Version upgrade to 2.0.29
Audience: VDI/OpenOnDemand users
Impact: VDI will be unavailable during this and the above Slurm upgrade

* Domain controller updates
Audience: All cluster
Impact: Could briefly impact some older systems, otherwise no impact expected

* Login node and VDI node reboots and firmware updates
Audience: Anyone logged into a a login node or VDI/OOD node
Impact: Login and VDI/OOD nodes will be unavailable while updating and rebooting

* Scratch cleanup ( https://docs.rc.fas.harvard.edu/kb/policy-scratch/ )
Audience: Cluster users
Impact: Files older than 90 days will be removed.

Reminder: Scratch 90-day file retention purging runs occur regularly not just during maintenance periods.

Thanks!
FAS Research Computing
Department and Service Catalog: https://www.rc.fas.harvard.edu/
Documentation: https://docs.rc.fas.harvard.edu/
Status Page: https://status.rc.fas.harvard.edu/

Jan 2023 to Mar 2023

FAS Research Computing - Notice history

Notice history

Mar 2023

Feb 2023

Jan 2023