Histórico de avisos

Operacional

ago 2022

Resolvido
agosto 31, 2022 em 06:53UTC
Resolvido
agosto 31, 2022 em 06:53UTC
Websites & Tools - MiniLIMs (FAS Informatics) is now operational! This update was created by an automated monitoring service.
Investigando
agosto 31, 2022 em 06:40UTC
Investigando
agosto 31, 2022 em 06:40UTC
Websites & Tools - MiniLIMs (FAS Informatics) cannot be accessed at the moment. This incident was created by an automated monitoring service.

Resolvido
agosto 31, 2022 em 06:17UTC
Resolvido
agosto 31, 2022 em 06:17UTC
Websites & Tools - Portal (portal.rc) is now operational! This update was created by an automated monitoring service.
Investigando
agosto 31, 2022 em 05:49UTC
Investigando
agosto 31, 2022 em 05:49UTC
Websites & Tools - Portal (portal.rc) cannot be accessed at the moment. This incident was created by an automated monitoring service.

NESE tape (Tier 3) upgrades

Concluído
agosto 26, 2022 em 20:43UTC
Concluído
agosto 26, 2022 em 20:43UTC
Maintenance has completed successfully.
Em curso
agosto 22, 2022 em 10:00UTC
Em curso
agosto 22, 2022 em 10:00UTC
Maintenance is now in progress
Ainda não iniciou
agosto 22, 2022 em 10:00UTC
Ainda não iniciou
agosto 22, 2022 em 10:00UTC
Our tier 3 tape system is part of and run by NESE (the NorthEast Storage Exchange).

We have been informed that they will be performing a significant upgrade of their Spectrum Scale archive system starting August 15th. This is a multi-day upgrade and will take approximately 3 days (potential for longer). Tier 3 tape allocations will be unavailable during this upgrade.

NESE has informed us that this maintenance will be deferred to next week (9/22/22).

Resolvido
agosto 10, 2022 em 02:59UTC
Resolvido
agosto 10, 2022 em 02:59UTC
Websites & Tools - MiniLIMs (FAS Informatics) is now operational! This update was created by an automated monitoring service.
Investigando
agosto 10, 2022 em 02:46UTC
Investigando
agosto 10, 2022 em 02:46UTC
Websites & Tools - MiniLIMs (FAS Informatics) cannot be accessed at the moment. This incident was created by an automated monitoring service.

Resolvido
agosto 04, 2022 em 14:05UTC
Resolvido
agosto 04, 2022 em 14:05UTC
The offending jobs have been cancelled. Holyscratch01 should be usable again now.
Investigando
agosto 03, 2022 em 15:00UTC
Investigando
agosto 03, 2022 em 15:00UTC
holyscratch01 is in a degraded state currently. A group of improperly architected jobs are hammering the filesystem which is impeding access for other users. We are in the process of identifying and stopping these jobs.

Until then, we recommend starting an interactive session and working from there as those will have the lowest impact. However, performance will still be slow until we are able to stop the problematic jobs.

jul 2022

Resolvido
julho 30, 2022 em 18:15UTC
Resolvido
julho 30, 2022 em 18:15UTC
Websites & Tools - SPINAL (FAS Informatics) is now operational! This update was created by an automated monitoring service.
Investigando
julho 30, 2022 em 17:48UTC
Investigando
julho 30, 2022 em 17:48UTC
Websites & Tools - SPINAL (FAS Informatics) cannot be accessed at the moment. This incident was created by an automated monitoring service.

Resolvido
julho 28, 2022 em 01:45UTC
Resolvido
julho 28, 2022 em 01:45UTC
Websites & Tools - MiniLIMs (FAS Informatics) is now operational! This update was created by an automated monitoring service.
Investigando
julho 28, 2022 em 01:32UTC
Investigando
julho 28, 2022 em 01:32UTC
Websites & Tools - MiniLIMs (FAS Informatics) cannot be accessed at the moment. This incident was created by an automated monitoring service.

NESE tape (Tier 3) hardware maintenance/install

Concluído
julho 21, 2022 em 22:00UTC
Concluído
julho 21, 2022 em 22:00UTC
Maintenance has completed successfully
Em curso
julho 21, 2022 em 10:00UTC
Em curso
julho 21, 2022 em 10:00UTC
Maintenance is now in progress
Ainda não iniciou
julho 21, 2022 em 10:00UTC
Ainda não iniciou
julho 21, 2022 em 10:00UTC
Our tier 3 tape system is part of and run by NESE (the NorthEast Storage Exchange).

We have been informed that they will be installing additional tape drives to the system on July 21st and this will require a whole day downtime to accomplish. Tier 3 tape archives will not be available to our users on that date.

Resolvido
julho 20, 2022 em 20:11UTC
Resolvido
julho 20, 2022 em 20:11UTC
Websites & Tools - MiniLIMs (FAS Informatics) is now operational! This update was created by an automated monitoring service.
Investigando
julho 20, 2022 em 19:58UTC
Investigando
julho 20, 2022 em 19:58UTC
Websites & Tools - MiniLIMs (FAS Informatics) cannot be accessed at the moment. This incident was created by an automated monitoring service.

Resolvido
agosto 17, 2022 em 14:14UTC
Resolvido
agosto 17, 2022 em 14:14UTC
The unrepairable volume on holylfs02 is isolated to two labs and they have been informed of next steps. This issue does not affect other areas on holylfs02, so we are closing this incident.
Atualizar
julho 28, 2022 em 21:35UTC
Atualizar
julho 28, 2022 em 21:35UTC
Recently we noticed an uptick in bad blocks on a RAID6 disk volume that is part of the entire filesystem. Generally speaking, the operating system will vector these out so no data is written there. During this period, we had to replace a number of drives due to failures; they are part of RAID 6 multi-disk set with dual parity and will rebuild with minimal impact on performance.

What we believe happened is that, during the rebuild process, bad data was copied to the replacement disks and the filesystem got corrupted. One of the staff ran a read-only, non-destructive repair on the volume in question and noted quite a few errors.

So far, we have 1) contacted the vendor (who gave us a command to clear additional bad blocks)
2) NOT run the actual repair command to "fix" the filesystem (which would delete data) 3) contacted the vendor to see if they had any partners that might be able to assist. We are planning to meet with one of those partners early next week, but are not confident they will have a solution. At this point (and depending on how that meeting goes), the next step would be to run the repair, note the extent of loss, and attempt to get the volume remounted.

Updates to follow when we have more information.

Note that these shelves are covered under warranty support for another year. We see no need at present to replace the 3PB allocation with new hardware (this is planned to take place Q3-4 FY23)
Identificado
julho 14, 2022 em 15:00UTC
Identificado
julho 14, 2022 em 15:00UTC
Users may experience issues connecting to holylfs02, or slow samba performance.

This is due to a hardware issue with holylfs02, and we have opened a ticket with the vendor. Updates to follow.

No ETA.

jun 2022

Globus.org maintenance Saturday June 18, 9am-10am

Concluído
junho 18, 2022 em 14:00UTC
Concluído
junho 18, 2022 em 14:00UTC
Maintenance has completed successfully
Atualizar
junho 18, 2022 em 13:00UTC
Atualizar
junho 18, 2022 em 13:00UTC
Maintenance is now in progress
Em curso
junho 18, 2022 em 13:00UTC
Em curso
junho 18, 2022 em 13:00UTC
Maintenance is now in progress
Ainda não iniciou
junho 18, 2022 em 13:00UTC
Ainda não iniciou
junho 18, 2022 em 13:00UTC
Globus has informed us that they will be performing maintenance Saturday June 18th 9am-10am

``` During the downtime, the following will be impacted:
- All services in the Globus ecosystem that use Auth, including Flows, Search, and Timer, will be unavailable.
- Users will be unable to initiate new transfers. Transfers that are inflight when the downtime begins will resume at the last checkpoint when the services are restored at the end of the maintenance period.
- Users will not be able to log into applications that rely on Globus Auth, including Globus webapp and command line interface.
- All third party applications and services that rely on Globus Auth will be impacted, and users will not be able to login/authenticate.
- Users will be unable to initiate new flow runs. Flow runs that attempt to start a new step during the downtime will fail. We recommend that users check the state of their flow runs after services have resumed. ```

Resolvido
junho 30, 2022 em 14:45UTC
Resolvido
junho 30, 2022 em 14:45UTC
This issue is currently not fixable on the login nodes. CVMFS can still be used on compute, but not login nodes. We are in contact with OSG and if there is a workaround later we will implement it. As a result, this incident is now closed.
Investigando
junho 17, 2022 em 15:01UTC
Investigando
junho 17, 2022 em 15:01UTC
CVMFS is available from compute nodes, but not from login nodes. This will only impact users who need this filesystem (OpenScienceGrid).

We will investigate this incident further next week.

Resolvido
junho 08, 2022 em 01:31UTC
Resolvido
junho 08, 2022 em 01:31UTC
Websites & Tools - SPINAL (FAS Informatics) is now operational! This update was created by an automated monitoring service.
Investigando
junho 08, 2022 em 01:19UTC
Investigando
junho 08, 2022 em 01:19UTC
Websites & Tools - SPINAL (FAS Informatics) cannot be accessed at the moment. This incident was created by an automated monitoring service.

Resolvido
junho 07, 2022 em 18:10UTC
Resolvido
junho 07, 2022 em 18:10UTC
A problem with the outgoing mail system the Portal uses to send approval and account emails has been found and fixed.

You may be receiving late notices now. If you receive approval emails and find nothing to approve when following the link, this is why. Apologies for the inconvenience. Additional monitoring is being put into place.

Resolvido
junho 01, 2022 em 14:51UTC
Resolvido
junho 01, 2022 em 14:51UTC
This issue with the ECS has been resolved.
Identificado
junho 01, 2022 em 14:45UTC
Identificado
junho 01, 2022 em 14:45UTC
We are looking at a certificate issue with bosecs and holyecs. Replacement certs should be available to us shortly.

jun 2022 para ago 2022

FAS Research Computing - Histórico de avisos

Histórico de avisos

ago 2022

jul 2022

jun 2022