Make WordPress Core

Opened 8 years ago

Last modified 4 months ago

#37763 assigned defect (bug)

Target server overload due to invalid RSS feed URL in RSS widget

Reported by: bstovall's profile bstovall Owned by: stevenkword's profile stevenkword
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 4.6
Component: Feeds Keywords: needs-patch
Focuses: performance Cc:

Description

Was roped into helping a friend figure out why a Wordpress site he managed was constantly registering 100% CPU usage. Turns out that they used the RSS widget and had it pointed to an RSS feed for their site. At some point the RSS feed had been deactivated and began generating a 404, and the server began getting four or five calls per second to the feed URL, effectively DoSing their own server. Although the widget was displayed on every page, they only average a few hundred unique visitors a day, so the number of requests from the widget far exceeded the number of page views.

I took some time trying to figure out why it might be doing this, but decided to stop looking through the trash WordPress code when I got to the fetch_feed() function. I just don't care enough. But I do care if there is a bug that causes that many requests when the feed URL returns a 404. I don't know how often feed names are changed or removed, but this could cause a huge number of unwanted requests.

My initial guess is that the SimplePie class is like "oh hey a 404 how about I try again. oh hey a 404 how about I try again. oh hey a 404 how about I try again. oh hey a 404 how about I try again." until it tires itself out. But like I said, too much trash code to care since I'm not getting paid to fix it.

Summary:

  1. RSS Widget on page.
  2. Invalid RSS feed URL, pointed to same server as page, returned a 404.
  3. Generated a number of requests for the RSS feed that was substantially higher than the number of page views.
  4. Caused 100% CPU usage on server.

Change History (14)

#1 follow-up: @westonruter
8 years ago

@bstovall does the RSS widget appear on the 404 template? If so, then this could result in infinite recursive HTTP requests and cause the 100% CPU usage you're describing.

#2 @SergeyBiryukov
8 years ago

  • Component changed from Widgets to Feeds

#3 @swissspidy
8 years ago

  • Keywords reporter-feedback added

#4 @stevenkword
8 years ago

Relates to #28134 and may get stuck at #30210

Last edited 8 years ago by stevenkword (previous) (diff)

#5 in reply to: ↑ 1 @bstovall
8 years ago

Replying to westonruter:

@bstovall does the RSS widget appear on the 404 template? If so, then this could result in infinite recursive HTTP requests and cause the 100% CPU usage you're describing.

Sorry, totally forgot about this.

Although I cannot confirm for sure since the template was updated to remove the RSS feed (before I was brought in), the 404 page does appear to use the same template as the other pages, so there is no reason we could come up with as to why the 404 page would not have the RSS feed. In other words: most likely.

That's a great suggestion.

#6 follow-up: @westonruter
8 years ago

One way to fix this is to update WP_Widget_RSS::widget() to short-circuit if is_404().

#7 in reply to: ↑ 6 @stevenkword
8 years ago

Replying to westonruter:

One way to fix this is to update WP_Widget_RSS::widget() to short-circuit if is_404().

#30210 should solve the problem, if we write test for it.

#8 @stevenkword
8 years ago

  • Keywords needs-testing added; reporter-feedback removed
  • Owner set to stevenkword
  • Status changed from new to assigned

#30210 landed in 4.7 and this ticket can probably be closed pending testing.

#9 @stevenkword
8 years ago

On second thought, that would only prevent the problem if you hit a /feed/ or ?feed URL. If you actually hit any other page belonging to the same site as the widget, the problem would still occur. Short-circuiting is_404() wouldn't prevent the problem either if say the URL of example.com/hello-world was entered as the feed source. You will get the notice in the widget that notifies you the feed is invalid, but if it were currently valid and then not valid in the future, the problem would persist.

#10 follow-up: @boonebgorges
8 years ago

  • Keywords needs-patch added; needs-testing removed

I just ran into a similar problem on a client site. In this case, the URL entered into the RSS widget was the site URL itself. As suggested by @stevenkword, 404-related tricks wouldn't address this issue.

SimplePie has internal caching for successfully parsed RSS feeds. Perhaps in the WP RSS widget - or even lower in the stack, perhaps in fetch_feed() - we can add our own caching (say, 10 minutes) for "invalid" feeds (where "invalid" covers both 404s and documents that can't be parsed as RSS). While not perfect, this would minimize the potential for self-DoSing. What do others think?

#12 in reply to: ↑ 10 @stevenkword
8 years ago

Replying to boonebgorges:

SimplePie has internal caching for successfully parsed RSS feeds. Perhaps in the WP RSS widget - or even lower in the stack, perhaps in fetch_feed() - we can add our own caching (say, 10 minutes) for "invalid" feeds (where "invalid" covers both 404s and documents that can't be parsed as RSS). While not perfect, this would minimize the potential for self-DoSing. What do others think?

This seems like a good iterative approach. @peterchester has a good start on it over in #11585, but it needs a refresh.

#13 @jrchamp
7 years ago

Ran into the same problem as @boonebgorges so for now I'm just going to turn autodiscovery off until there's a real fix here. One idea would be to ask for a lock before a thread takes on the task of discovering and parsing the RSS feed so that the other threads can move on with their lives instead of duplicating work. The lock would expire of course, but at least only one of the threads would be asked to do the work for that period of time.

#14 @pbearne
4 months ago

#11585 got a refresh so lets get that in

Note: See TracTickets for help on using tickets.