Make WordPress Core

Opened 7 years ago

Last modified 3 years ago

#41281 accepted enhancement

attachment_url_to_postid results in very slow query

Reported by: takahashi_fumiki's profile Takahashi_Fumiki Owned by: joemcgill's profile joemcgill
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 4.9
Component: Database Keywords:
Focuses: performance Cc:

Description

attachment_url_to_postid throws query like this:

# wp-includes/media.php ll.3922
$sql = $wpdb->prepare(
	"SELECT post_id FROM $wpdb->postmeta WHERE meta_key = '_wp_attached_file' AND meta_value = %s",
	$path
);

But wp_postmeta table has index only of meta_key, this may cause file sort.

Why this is problem

Let's assume that you have 20,000 posts and each post has 10 attachments in your WP news site.
MySQL filters rows with meta_key but still remain 200,000 possible rows.
This causes file sort.

attachment_url_to_postid is used not only in admin screen, but also in public area(e.g. AMP Plugin https://wordpress.org/plugins/amp/ ).

Now twitter changes URL for mobile device if AMP version is available.
So, MySQL CPU raises up to 100% by attachment_url_to_postid if site traffic is high,

Solution

To avoid slow query, I suggest adding another index.

ALTER TABLE $wpdb->postmeta ADD INDEX meta_key_meta_value (meta_key(191), meta_value (64));

Attachments (1)

41281.diff (920 bytes) - added by Takahashi_Fumiki 7 years ago.
Add upgrade query but I don't know numbering policy for $wp_db_version

Download all attachments as: .zip

Change History (12)

@Takahashi_Fumiki
7 years ago

Add upgrade query but I don't know numbering policy for $wp_db_version

#1 @markjaquith
6 years ago

@Takahashi_Fumiki — thanks for this ticket!

I ran into this issue today. Client has a lot of images hotlinked to their CDN that will not resolve to a post ID. A lot of AMP or FaceBook Instant Articles plugins run attachment_url_to_postid() on images found in posts, to optimize them for those formats. When your postmeta table is large, this query is slow.

There's no hook that lets you bail before that query gets run, so I nuked it at a really low level (BE CAREFUL WITH THE query HOOK).

https://gist.github.com/markjaquith/2c8c29a0ccd2916520e5b71fcec626ca

We could solve this in WordPress core a few ways.

  1. Make the query faster, with an index.
  2. Cache the results of these lookups.
  3. Whitelist domains that we do the lookups for.

Or some combination of those.

#2 @Takahashi_Fumiki
6 years ago

@markjaquith thanks for a comment and tip.

In my case, I've tried to solve it with page cache at first.
But "spike" of traffic overwhelmed it, because it was a historical big news.
MySQL upset before creating cache.

We finally solved it by adding index to meta_value as raw query. Forward match may work on other query.
Of course, adding index isn't fundamental resolution in case I have 300,000,000 posts or domain is longer than 64 letters.

I guess this is very rare case and I don't know we should apply this patch to WP core.
Besides that, it's very tough to add index to wp_postmeta. I forgot which ticket, but I guess you should know which ticket I mention:)

Anyway, I believe this ticket is useful knowledge for a few people like "how to piss on top of Mt. Everest".

#3 @w33zy
6 years ago

Just came across this as I noticed that my plugin's usage of attachment_url_to_postid can get a bit slow when running a high number of queries for image attachments.

#4 @birgire
6 years ago

#44580 was marked as a duplicate.

#5 @samvthom16
6 years ago

I have been facing the same issue as I have close to 40k posts and equivalent number of items in the media library. Lot of plugins like Yoast Seo use the function attachment_url_to_postid() quite often. This causes a slow query because of the huge database.

I agree with @markjaquith on caching the results as these URLs don't change that often. The function can use transient variables instead of querying the posts table.

#6 @sjthompson
5 years ago

Also seeing the same issue here through Yoast SEO's use of the function. Forgive my ignorance here, but is there any reason the query in attachment_url_to_postid can't be modified with a "LIMIT 1"?

Patching this on our local environment takes a 0.6s query down to 0.11 - not necessarily solving the whole problem, but helping at least.

#7 @joemcgill
5 years ago

  • Owner set to joemcgill
  • Status changed from new to accepted

Seems to me that adding at least minimal caching of these lookups would be valuable and potentially adding a filter before the query is run to allow for more easily handling specialized cases, like bailing early if you know that the URL won't resolve to a post ID.

Before adding a simple cache wrapper around the query results, I'd like to better understand the use cases where the current situation is a problem. For example, where would a simple cache keyed to the input $url value be inadequate?

#8 @archon810
5 years ago

Ran into this issue as well on a large site where posts with lots of images after we added more calls to attachment_url_to_postid() started taking 60s+ (issue caught on staging, thankfully), and the simplest and fastest solution which I already put together years ago on another site and forgot all about was indeed to add a combined index on meta_key, meta_value.

I went as far as using 191 - the max allowed, for both key and value, and all the perf issues were resolved instantly. It should be good enough for most sites unless you have a lot of urls that share the same exact first 191 characters - highly unlikely.

This should be the solution in the core. If the core team does decide to go with this solution (which I think it should), there's no point creating another index - the stock meta_key index should just be converted into the composite index instead.

#9 @Takahashi_Fumiki
5 years ago

I agree with @archon810 because cache itself is not a fundamental solution.
Without index on meta_value, it may take over 60 seconds just for one query with a large volume of posts.
MySQL will hang up with requests in rapid succession, so there's no chance to hold cache and use it.

Since more than 10 years have been passed from WordPress' first release, WP sites with 100,000 posts are common.

#10 @benjmay
3 years ago

I don't think caching will work either. Firstly, it'd bring the infrastructure to its knees just to build that cache. Which I think would time out before hand. The queries quickly backed up and were processing at 20seconds (with 4 x db.r4.xlarge readers).

Secondly, if that cache was cleared for whatever reason, back to problem 1 again and bring a production platform down unexpectedly.

#11 @homeworker
3 years ago

I have the same problem, I have a website with 2.000.000 rows in wp_postmeta. If I select all rows it take 0.00002s, but with the query generated by attachment_url_to_postid from elementor it take 0.4s (for every image in page) slowing the home generation up to 20s

SELECT post_id, meta_value
FROM wp_postmeta
WHERE meta_key = '_wp_attached_file'
AND meta_value = '2020/03/almo-nature.svg'

The problem is that this table is indexed for post_id, meta_vale and meta_key but NOT for meta_value that is used from this function.

I solved it adding an index(100) to the meta_value and changed from longtext to text, I don't think that this column need 4,294,967,295 characters. Isn't text 65,535 characters enough? Isn't varchar faster?

Thanks

Last edited 3 years ago by homeworker (previous) (diff)
Note: See TracTickets for help on using tickets.