Event tracking unavailable

Major incident Web Personalization Tracker
Nov, 12 2024 07:44 -03 · 3 hours, 4 minutes

Updates

Resolved

We’ve identified the root cause of the problem as an internal signing certificate failing to persist once renewed. This caused the system to become unresponsive when the previous certificate expired, as the application could not sign the tracked events.

Timeline

  • At 05:00 UTC, the first certificate expired.
    • A misconfigured alert rule failed to alert us at this point.
    • Event tracking was not immediately affected.
  • At 10:44 UTC, the last issued certificate expired.
    • At this point, event tracking became unresponsive.
  • At 11:30 UTC, we identified the problem and notified affected services and customers.
  • At 13:12 UTC, we identified the root cause of the services unavailable as the failing persistence of the renewed certificates.
  • At 13:26 UTC, we rolled out the fixed configuration for all instances.
    • At this point, event tracking was back online.
    • All end-users online had their pending events from the incident period sent again to our tracking system.
  • At 13:37 UTC, tracking of pending events for online end-users was completed.
  • At 13:48 UTC, all instances were back online with renewed certificates.

Notes

  • The alerting rule that should have informed us of the problem earlier was fixed.
  • Event data for the duration of the service being unresponsive (10:44 to 13:26 UTC) of users that ended their sessions within this period was lost. The proportion of data lost from the period of the incident will vary depending on users’ behavior for each customer application.
November 12, 2024 · 11:05 -03
Monitoring

We’ve identified the root cause of the problem, all systems are back to normal operation.

A timeline of the incident will be sent shortly after.

November 12, 2024 · 10:46 -03
Issue

We are experiencing an issue with our event tracker, which is temporarily unavailable. Our team is investigating the cause of this disruption and working towards a resolution.

Personalization services remain responsive utilizing all information tracked up to the start of this incident.

We are committed to keeping you informed throughout this process and will provide updates as they become available. Thank you for your understanding and patience.

November 12, 2024 · 09:06 -03

← Back