[osc-fr1] Unexpected 502 errors when reaching apps

RSS ATOM

» Published on Wed, 23 Oct 2024 12:25:00 +0000

Post-Mortem

The incident was triggered by a bug in a caching mechanism within a component upstream of our platform. This component, which is critical for directing HTTP requests to your applications, was not being properly supplied with the metadata needed to route the HTTP traffic to the appropriate containers. As a result, traffic may have been routed to containers that were no longer operational, either due to a redeployment or a modification in the topology of your applications, such as a scale down operation.

Although preexisting, this issue manifested as a side effect following an update to our infrastructure servers.

Our development teams have pinpointed the exact origin of the problem and a fix is currently being developed.

We have also initiated enhancements to the configuration of our monitoring probes to increase our ability to detect and respond more promptly to such incidents.

A comprehensive retrospective of this incident is also scheduled to explore additional potential improvements.
» Updated Fri, 25 Oct 2024 16:02:00 +0000
Resolved

The situation has been stable for 30 minutes. We are still investigating but the incident is considered closed.

If you are still experiencing issues, please contact our support.
» Updated Wed, 23 Oct 2024 12:38:00 +0000
Update

The 502 errors should have disappeared following our operators intervention.

Investigation is still ongoing in order to identify a root cause.
» Updated Wed, 23 Oct 2024 12:37:00 +0000
Investigating

We detected an unusual rate of 502 errors when reaching apps hosted on the osc-fr1 region.

Our operators are currently investigating the issue.

We'll keep you updated.
» Updated Wed, 23 Oct 2024 12:25:00 +0000

Scalingo Status

[osc-fr1] Unexpected 502 errors when reaching apps

Post-Mortem

Resolved

Update

Investigating