Make WordPress Core

Opened 5 months ago

Last modified 6 weeks ago

#61701 new defect (bug)

Rate limiting can occur when sending Slack notifications out through GitHub Actions

Reported by: desrosj's profile desrosj Owned by:
Milestone: 6.8 Priority: normal
Severity: normal Version:
Component: Build/Test Tools Keywords: needs-patch
Focuses: Cc:

Description

In some scenarios, Slack notifications being sent through GitHub Actions are being rate limited. This is mainly happening when the scheduled testing of old branches able to receive security updates kicks off, but it could happen any time there is enough of a volume of events that send notifications.

This seems to happen because of how GHA jobs are queued and picked up. Since the Slack notification step is a separate job, many are queued while waiting for the completion of more time intensive ones, such as PHPUnit or E2E testing. While Slack does not publish exact parameters for rate limiting, they are documented for webhooks as 1 per second with short bursts >1 allowed.

There's currently no built in way for the Slack action to handle this, though retries for failed calls is being discussed as a possible feature for V2 of the action.

It would be great to find a way to retry or space out these attempts until then to prevent workflow failures from occurring solely because the notification is unsuccessful.

Change History (3)

#1 @desrosj
2 months ago

In 59209:

Build/Test Tools: Temporarily ignore Slack failures.

When many workflows are initiated at the same time, there are often instances where the requests to Slack providing updates are rate limited. This usually happens when the Test Old Branches workflow runs and initiates testing for all workflows in branches that could potentially receive a security update.

Even though everything was successful in the workflow except the message, the workflow run is marked as failed. The next time the same workflow runs for that branch, a “fixed” message will be sent to #core in Slack. The result is a burst of messages that is quite noisy and unnecessary.

This temporarily adds continue-on-error to the jobs responsible for sending the messages until a better solution can be decided on.

See #61701.

#2 @desrosj
2 months ago

  • Keywords needs-patch added
  • Milestone changed from 6.7 to Future Release

While [59209] addresses the stampede of failures/messages that can occur, the heart of the issue is not addressed. Going to punt this out to follow up on.

#3 @desrosj
6 weeks ago

  • Milestone changed from Future Release to 6.8

Looks like version 2.0 of the action will include retrying failed requests. It's currently in RC state, though, and does have breaking changes for all methods of posting slack messages.

Moving to 6.8 to follow progress.

Note: See TracTickets for help on using tickets.