Make WordPress Core

Opened 13 years ago

Closed 13 years ago

#17560 closed defect (bug) (fixed)

Severe bug in XMLRPC and Windows Live Writer integration related to to_ping

Reported by: archon810's profile archon810 Owned by: ryan's profile ryan
Milestone: 3.4 Priority: normal
Severity: major Version: 3.1.2
Component: XML-RPC Keywords: has-patch needs-testing 2nd-opinion early
Focuses: Cc:

Description

I found a pretty serious performance bug with XMLRPC and Windows Live Writer, which is a really popular post software made by Microsoft in use by many large blogs.

In my continuous fight to improve performance, or rather figure out what the heck is badly slowing down my server, I noticed that I get a lot of stuck doing_cron apache processes after posting and the load shoots up.

I ran strace on some apache processes and found that a lot of them are stuck doing do_pings(), rolling through tons of posts using something like:

SELECT to_ping FROM wp_posts WHERE ID = 27823
SELECT pinged FROM wp_posts WHERE ID = 27823

To figure out why a bunch of my wp cron processes are rolling through random posts trying to ping them all the time, I ran a db query and found that 800+ out of 2900 posts have to_ping field set to

(comma
separated)

Exactly as above, 2 lines.

So, what I think happens is a bunch of wp cron processes roll through 800 posts all the time and try to ping non-existent servers, which fails. Then the field is not cleared and this happens again and again.

At first, I wasn't able to figure out where this value is coming from, but then it hit me - I was able to trace it back to Windows Live Writer's interface. Take a look: http://farm6.static.flickr.com/5029/5760026936_50895cd36e_b.jpg.

The value here is grayed out, meaning it disappears as soon as you click into it. However, it seems like WLW sends it along anyway, even if the user doesn't fill it in. Then it gets put in the database exactly as above and here we are.

I realize it's not a Wordpress bug per se, but I believe it affects a lot of Wordpress installations that use Windows Live Writer and it's already too late even if MS fixes it. Getting MS to fix it is a whole different business.

So, here are my questions:

  • is do_all_pings() really run every time cron runs?
  • is it possible that multiple doing_cron processes end up running at the same time due to concurrency issues on very busy servers like mine? I see a ton of them during high load, all running simultaneously. They should really be using a semaphore of some sort.
  • can this bug be worked around in Wordpress?

Thank you.

Attachments (5)

17560.first.diff (1.4 KB) - added by xknown 13 years ago.
Introduce sanitize_trackback_urls function. I only consider http and https as the protocols that allow trackbacks.
17560.second.diff (1.8 KB) - added by xknown 13 years ago.
Second try. Clean also the urls returned by get_to_ping. Not tested.
17560.3.diff (1.8 KB) - added by SergeyBiryukov 13 years ago.
17560.4.diff (1.8 KB) - added by SergeyBiryukov 13 years ago.
17560.5.diff (1.8 KB) - added by SergeyBiryukov 13 years ago.

Download all attachments as: .zip

Change History (24)

#1 @archon810
13 years ago

  • Cc admin@… added
  • Summary changed from Severe bug in XMLRPC and Windows Live Writer integration to Severe bug in XMLRPC and Windows Live Writer integration related to to_ping

#2 @archon810
13 years ago

I narrowed this down to updates done via Windows Live Writer. Upon publishing, the to_ping value is actually blank, but updates are resulting in to_ping changing to

(comma
separated)

#3 @xknown
13 years ago

I think WP should clean all the trackback urls that do not start with a protocol when saving the posts.
If nobody proposes a patch, I'll try to make one by tomorrow.

#4 @scribu
13 years ago

  • Keywords needs-patch added
  • Milestone changed from Awaiting Review to Future Release

What xknown said.

#5 @archon810
13 years ago

Sounds like a good solution, which will be easy to verify using the steps above (post, then update using WLW).

@xknown
13 years ago

Introduce sanitize_trackback_urls function. I only consider http and https as the protocols that allow trackbacks.

#7 @archon810
13 years ago

Bug #17210 seems to be caused directly by this bug. I've left a comment describing my findings there.

#8 @archon810
13 years ago

17560.first.diff works, except it doesn't alleviate the problem of re-selecting all the same rows every time. Bad data should be cleaned in the posts table after the ping, shouldn't it?

@xknown
13 years ago

Second try. Clean also the urls returned by get_to_ping. Not tested.

#9 @xknown
13 years ago

  • Keywords has-patch needs-testing 2nd-opinion added; needs-patch removed

#10 @archon810
13 years ago

I've confirmed that fixing this bug and #17210 that it causes solved my massive load after each post is published problem that's been getting worse and worse over the past months and made me lose a lot of sleep. Wordpress makes sense again!

#11 @SergeyBiryukov
13 years ago

  • Milestone changed from Future Release to 3.3

#12 @nacin
13 years ago

I like this. It probably has hardening benefits. Perhaps esc_url_raw() could be used in some way?

We should probably remove the trim(), though, and do that in the function.

#13 @archon810
13 years ago

Will it make it into 3.3? Combined with the wp-cron locking fixes, which are still very much problematic in 3.2, this would make 3.3 the most exciting release I can remember.

#14 @SergeyBiryukov
13 years ago

Refreshed the patch.

#15 @SergeyBiryukov
13 years ago

Need to be careful with \s which can break some UTF-8 characters.

17560.4.diff uses \r\n\t and space.

#16 @SergeyBiryukov
13 years ago

17560.5.diff adds esc_url_raw().

#17 @ryan
13 years ago

  • Keywords early added
  • Milestone changed from 3.3 to Future Release

This looks good, but let's wait until 3.4 early to get more soak time.

#18 @ryan
13 years ago

  • Milestone changed from Future Release to 3.4

#19 @ryan
13 years ago

  • Owner set to ryan
  • Resolution set to fixed
  • Status changed from new to closed

In [19675]:

Introduce sanitize_trackback_urls(). Don't ping bad urls. Don't ping bad urls or save them to the DB. Props xknown, SergeyBiryukov. fixes #17560

Note: See TracTickets for help on using tickets.