Make WordPress Core

Opened 8 years ago

Last modified 7 months ago

#41281 accepted enhancement

attachment_url_to_postid results in very slow query

Reported by: takahashi_fumiki's profile Takahashi_Fumiki Owned by: joemcgill's profile joemcgill
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 4.9
Component: Database Keywords:
Focuses: performance Cc:

Description

attachment_url_to_postid throws query like this:

# wp-includes/media.php ll.3922
$sql = $wpdb->prepare(
	"SELECT post_id FROM $wpdb->postmeta WHERE meta_key = '_wp_attached_file' AND meta_value = %s",
	$path
);

But wp_postmeta table has index only of meta_key, this may cause file sort.

Why this is problem

Let's assume that you have 20,000 posts and each post has 10 attachments in your WP news site.
MySQL filters rows with meta_key but still remain 200,000 possible rows.
This causes file sort.

attachment_url_to_postid is used not only in admin screen, but also in public area(e.g. AMP Plugin https://wordpress.org/plugins/amp/ ).

Now twitter changes URL for mobile device if AMP version is available.
So, MySQL CPU raises up to 100% by attachment_url_to_postid if site traffic is high,

Solution

To avoid slow query, I suggest adding another index.

ALTER TABLE $wpdb->postmeta ADD INDEX meta_key_meta_value (meta_key(191), meta_value (64));

Attachments (1)

41281.diff (920 bytes) - added by Takahashi_Fumiki 8 years ago.
Add upgrade query but I don't know numbering policy for $wp_db_version

Download all attachments as: .zip

Change History (15)

@Takahashi_Fumiki
8 years ago

Add upgrade query but I don't know numbering policy for $wp_db_version

#1 @markjaquith
7 years ago

@Takahashi_Fumiki — thanks for this ticket!

I ran into this issue today. Client has a lot of images hotlinked to their CDN that will not resolve to a post ID. A lot of AMP or FaceBook Instant Articles plugins run attachment_url_to_postid() on images found in posts, to optimize them for those formats. When your postmeta table is large, this query is slow.

There's no hook that lets you bail before that query gets run, so I nuked it at a really low level (BE CAREFUL WITH THE query HOOK).

https://gist.github.com/markjaquith/2c8c29a0ccd2916520e5b71fcec626ca

We could solve this in WordPress core a few ways.

  1. Make the query faster, with an index.
  2. Cache the results of these lookups.
  3. Whitelist domains that we do the lookups for.

Or some combination of those.

#2 @Takahashi_Fumiki
7 years ago

@markjaquith thanks for a comment and tip.

In my case, I've tried to solve it with page cache at first.
But "spike" of traffic overwhelmed it, because it was a historical big news.
MySQL upset before creating cache.

We finally solved it by adding index to meta_value as raw query. Forward match may work on other query.
Of course, adding index isn't fundamental resolution in case I have 300,000,000 posts or domain is longer than 64 letters.

I guess this is very rare case and I don't know we should apply this patch to WP core.
Besides that, it's very tough to add index to wp_postmeta. I forgot which ticket, but I guess you should know which ticket I mention:)

Anyway, I believe this ticket is useful knowledge for a few people like "how to piss on top of Mt. Everest".

#3 @w33zy
7 years ago

Just came across this as I noticed that my plugin's usage of attachment_url_to_postid can get a bit slow when running a high number of queries for image attachments.

#4 @birgire
7 years ago

#44580 was marked as a duplicate.

#5 @samvthom16
7 years ago

I have been facing the same issue as I have close to 40k posts and equivalent number of items in the media library. Lot of plugins like Yoast Seo use the function attachment_url_to_postid() quite often. This causes a slow query because of the huge database.

I agree with @markjaquith on caching the results as these URLs don't change that often. The function can use transient variables instead of querying the posts table.

#6 @sjthompson
6 years ago

Also seeing the same issue here through Yoast SEO's use of the function. Forgive my ignorance here, but is there any reason the query in attachment_url_to_postid can't be modified with a "LIMIT 1"?

Patching this on our local environment takes a 0.6s query down to 0.11 - not necessarily solving the whole problem, but helping at least.

#7 @joemcgill
6 years ago

  • Owner set to joemcgill
  • Status changed from new to accepted

Seems to me that adding at least minimal caching of these lookups would be valuable and potentially adding a filter before the query is run to allow for more easily handling specialized cases, like bailing early if you know that the URL won't resolve to a post ID.

Before adding a simple cache wrapper around the query results, I'd like to better understand the use cases where the current situation is a problem. For example, where would a simple cache keyed to the input $url value be inadequate?

#8 @archon810
6 years ago

Ran into this issue as well on a large site where posts with lots of images after we added more calls to attachment_url_to_postid() started taking 60s+ (issue caught on staging, thankfully), and the simplest and fastest solution which I already put together years ago on another site and forgot all about was indeed to add a combined index on meta_key, meta_value.

I went as far as using 191 - the max allowed, for both key and value, and all the perf issues were resolved instantly. It should be good enough for most sites unless you have a lot of urls that share the same exact first 191 characters - highly unlikely.

This should be the solution in the core. If the core team does decide to go with this solution (which I think it should), there's no point creating another index - the stock meta_key index should just be converted into the composite index instead.

#9 @Takahashi_Fumiki
5 years ago

I agree with @archon810 because cache itself is not a fundamental solution.
Without index on meta_value, it may take over 60 seconds just for one query with a large volume of posts.
MySQL will hang up with requests in rapid succession, so there's no chance to hold cache and use it.

Since more than 10 years have been passed from WordPress' first release, WP sites with 100,000 posts are common.

#10 @benjmay
4 years ago

I don't think caching will work either. Firstly, it'd bring the infrastructure to its knees just to build that cache. Which I think would time out before hand. The queries quickly backed up and were processing at 20seconds (with 4 x db.r4.xlarge readers).

Secondly, if that cache was cleared for whatever reason, back to problem 1 again and bring a production platform down unexpectedly.

#11 @homeworker
4 years ago

I have the same problem, I have a website with 2.000.000 rows in wp_postmeta. If I select all rows it take 0.00002s, but with the query generated by attachment_url_to_postid from elementor it take 0.4s (for every image in page) slowing the home generation up to 20s

SELECT post_id, meta_value
FROM wp_postmeta
WHERE meta_key = '_wp_attached_file'
AND meta_value = '2020/03/almo-nature.svg'

The problem is that this table is indexed for post_id, meta_vale and meta_key but NOT for meta_value that is used from this function.

I solved it adding an index(100) to the meta_value and changed from longtext to text, I don't think that this column need 4,294,967,295 characters. Isn't text 65,535 characters enough? Isn't varchar faster?

Thanks

Last edited 4 years ago by homeworker (previous) (diff)

This ticket was mentioned in Slack in #core-performance by spacedmonkey. View the logs.


7 months ago

#13 @adamsilverstein
7 months ago

#61651 was marked as a duplicate.

#14 @apermo
7 months ago

I ran into the exact same issue as stated in #61651 and #61383.

Quote from #61651

The function attachment_url_to_postid() in wp-includes/media.php will perform this query.
SELECT post_id, meta_value FROM wp_postmeta WHERE meta_key = '_wp_attached_file' AND meta_value = %s;
While this query looks super harmless, it can be really really slow. On a site that I maintain, this query takes a total of 4s (four seconds). Edit: the longest was over 6s
As I already described in #61383 this issue will happen on large scale websites with several 100k attachments.
And even though this is an edge case, I would like to spark an open discussion about it.
In my case, I've verified an index on meta_value, which improved the query from 4s to approx. 5-10ms, I lacked the confidence to do so, so I introduced a custom table for as lookup table, which improved the query down to 2ms. I would have been absolutely happy with the 5-10ms though.
I still need to check wether the extensive calls of attachment_url_to_postid() where caused by WordPress core, or a third Party plugin. I will add this information as soon as possible.
While it may not slow down smaller sites as much as it slowed down my clients site, it will on a larger scale, still burn a noticeable amount of CPU time, that will ultimately consume power and thus produce CO2.
Following a topic on GitHub regarding performance. Otto42 replied to the suggestion to add indexes to wp_postmeta with

Indexing the meta_value is a bad idea, because you should not be searching for anything based on the meta_value in the first place.
---
Otto42 commented on Apr 17, 2022
https://github.com/WordPress/performance/issues/132#issuecomment-1100829520

At WordCamp Europe 2024 in Torino, @luehrsen asked in the Q&A about database improvement. https://www.youtube.com/watch?v=RxQ5yhsxig4&t=429s
I can completely agree with Matts reply, that the WordPress database is extremely flexible and any changes should be made with care.
But since there is a contradiction between "should not be searching for anything base on the meta_value" and what happens inside attachment_url_to_postid(), I think we should discuss and improve this.
I have no clue what the best solution could be.
A few further thoughts to spark the discussion.
wp_posts does have a column post_mime_type.
Why is the mime type in wp_posts and while the path to the attached file is in meta? Could it be the solution to make a column attached_file?
The problem with this is the balance between redundancy and backwards compatibility. Since there will be plugins directly accessing _wp_attached_file in wp_postmeta this would require filters to make get_post_meta() still return _wp_attached_file and set_post_meta() write it. So while I really like this approach, I think this one has a lot special cases and possible side effects.
Indices in MySQL have a length what they index, meta_value for _wp_attached_file will contain something like 2024/07/my-fancy-foto-with-a-speaking-name.jpg (46 characters). So adding even adding a short index, like 10-20 characters long, would already be a gain here. This would likely solve any issues with most queries against wp_postmeta

I hope to restart the discussion in order to improve things for everyone. As mentioned in the other topic, with adding the filter proposed in #61383, there is a way to improve things for people who are aware of the solution. But it would be far from the ideal solution.

Does anyone have a third solution besides adding an index to wp_postmeta or adding the path as column to wp_posts.

In my example I went for adding a custom table, that will only contain the post_id and the attachment_file_path, which speeds things up from 3-6s down to 2ms, while I think this was the best choice for my clients project, I don't think this would be the perfect solution for WordPress.

Last edited 7 months ago by apermo (previous) (diff)
Note: See TracTickets for help on using tickets.