Automation Workflow Trigger Delays

Incident Report for Lever

Postmortem

We want to share an update on a recent automation issue, including what happened, how it was resolved, and the steps we’ve taken to prevent it from happening again.

Customer Impact

Between January 21 and January 23, 2026, some customers experienced delays in automated workflow execution. Automation workflows entered a “waiting” state and were processed behind schedule rather than in real time.

While no data was lost, workflow execution was slower than expected, which may have delayed downstream actions that depend on automation completion.

Service levels were fully restored by January 23, 2026 at 08:54, and workflows have been operating in real time since that point.

Root Cause

We had a single instance that had accumulated a very large number of automation workflows configured to trigger on the same event type. Each time an application event occurred, all of these workflows were evaluated simultaneously.

This configuration resulted in tens of thousands of unnecessary database queries per event, which significantly increased system load. As volume grew, the automation system was no longer able to process workflows at the rate they were being generated, causing a backlog and delays for all automation tasks.

In short:

  • The automation system encountered an unusually high and inefficient workload pattern.
  • Existing safeguards and scaling assumptions were insufficient for this edge case.

Resolution

Once the issue was identified, we implemented multiple performance optimizations to reduce unnecessary processing and improve throughput, including:

  • Skipping workflows that did not apply to the triggering event before running expensive queries.
  • Reducing redundant queries where only a single result was required.
  • Improving the overall efficiency of the automation workflow.

These changes significantly reduced system load, allowed processing queues to catch up, and restored real‑time workflow execution. Service levels returned to normal by January 23, 2026, and the system has remained stable since.

Preventative Actions

To prevent similar incidents in the future, we have taken and are continuing the following actions:

  • Improved monitoring and alerting for automation workflow latency, enabling earlier detection before customer impact.
  • Hardened automation workflow performance, ensuring the system can safely handle large numbers of workflows without unnecessary processing.
  • Reviewed and updated testing strategies to better reflect real‑world production workloads and edge cases.
  • Ongoing optimization work to further improve the scalability and resilience of the automation system.
  • Evaluation of capacity and scaling behavior to ensure consistent performance during peak and non‑peak hours.

These measures are already in place or actively underway and are designed to ensure automation workflows continue to operate reliably as usage grows.

We appreciate your trust and patience, and we’re committed to continuously improving the reliability and performance of the automation experience.

Posted Feb 02, 2026 - 11:54 PST

Resolved

This incident has been resolved. All systems related to Automation Workflows continue to perform and operate as expected.
Posted Jan 23, 2026 - 17:39 PST

Update

Update: We are continuing to observe good performance and stability across all processes. At this time, all systems are operating as expected. Our team will maintain active monitoring throughout the day to ensure continued reliability and success.
We will provide additional updates as needed.
Posted Jan 23, 2026 - 09:31 PST

Monitoring

Workflow trigger delays have been resolved and we are actively monitoring to ensure continued performance and stability.
Posted Jan 22, 2026 - 13:22 PST

Identified

This issue with automation workflow trigger delays has resurfaced. See the original report of this issue here: https://status.lever.co/incidents/h5ypwy1cllrl

Our Engineering team has identified the issue and is actively working on a fix.
Posted Jan 22, 2026 - 10:57 PST
This incident affected: Global Data Center - LeverTRM (Hire).