WordPress.org

Make WordPress Core

Opened 3 years ago

Last modified 3 years ago

#39603 new enhancement

The more posts with similar names you have, the slower you save the next one

Reported by: bisyarin Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version:
Component: Posts, Post Types Keywords: needs-testing has-patch has-unit-tests
Focuses: performance Cc:
PR Number:

Description

Hello, developers! Thank you for your work on Wordpress!

While using Flamingo for saving Contact Form 7 submissions, I encountered a huge slowdown after more than 11 000 submissions had been saved. I thought it was a Flamingo problem, but it appears that all autogenerated post-types in Wordpress can cause this problem.

Contact Form 7 saves submissions as posts with 'flamingo_inbound' type, and uses 'your-subject' slug when a form doesn't have a subject (which is my case). So, in all saved form submissions I have 'your-subject, your-subject-2, ..., your-subject-N' slugs generated by wp_unique_post_slug(). The problem is that for every new 'your-subject-N' slug wp_unique_post_slug() issues N-1 SELECT queries. In other words, wp_unique_post_slug() bruteforces a new possible slug, generating a lot of SQL queries fetching all previously saved similar posts.

You can see it if you run the attached 'generate-many-posts.php' script from your Wordpress root (more and more queries will be used for saving every new post).

Also, let me suggest a solution. With the attached patch we don't bruteforcing the slug suffix. We just get the last appropriate slug, increment it's suffix, and than use this suffix in a new unique slug.

Attachments (6)

generate-many-posts.php (1.9 KB) - added by bisyarin 3 years ago.
Problem Demo
constant-time-slug-generation.diff (4.3 KB) - added by bisyarin 3 years ago.
Patch enabling constant-time slug generation
constant-time-slug-generation-2.diff (4.7 KB) - added by bisyarin 3 years ago.
The Second Patch of performance improving
wpUniquePostSlug-unit-tests.diff (5.1 KB) - added by bisyarin 3 years ago.
filling-gaps-slug-generation.diff (5.2 KB) - added by bisyarin 3 years ago.
Patch for post.php for faster slug generation with filling gaps approach
filling-gaps-wpUniquePostSlug-unit-tests.diff (6.8 KB) - added by bisyarin 3 years ago.
Patch for wpUniquePostSlug.php with tests for faster slug generation with filling gaps approach

Download all attachments as: .zip

Change History (15)

@bisyarin
3 years ago

Problem Demo

@bisyarin
3 years ago

Patch enabling constant-time slug generation

#1 @johnbillion
3 years ago

  • Keywords has-patch needs-testing needs-unit-tests added
  • Type changed from defect (bug) to enhancement
  • Version 4.7.1 deleted

Thanks for the patch, @bisyarin , and welcome to Trac :-)

This will need some testing (and unit tests) to demonstrate that the functionality continues to work as expected with edge case slugs, such as your-subject-3-2, not-your-subject-7, your-subject-suffix-2, etc.

#2 follow-up: @pento
3 years ago

Another edge case to test - when a slug on an older post has been edited. Ie:

  • Create post with slug foo
  • Create post with slug bar
  • Edit foo to have the slug bar, which should be changed to bar-2.

After that, create another post with slug bar - it should be changed to bar-3, but I think constant-time-slug-generation.diff will return bar-2.

#3 in reply to: ↑ 2 ; follow-up: @rmccue
3 years ago

Replying to pento:

After that, create another post with slug bar - it should be changed to bar-3, but I think constant-time-slug-generation.diff will return bar-2.

This could be fixed with ORDER BY post_name, right?

#4 in reply to: ↑ 3 @pento
3 years ago

Replying to rmccue:

Replying to pento:

After that, create another post with slug bar - it should be changed to bar-3, but I think constant-time-slug-generation.diff will return bar-2.

This could be fixed with ORDER BY post_name, right?

Nope.

mysql> DROP TABLE IF EXISTS foo;
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE foo (
    -> bar VARCHAR(255)
    -> );
Query OK, 0 rows affected (0.01 sec)

mysql> INSERT INTO foo(bar) VALUES('baz'),('baz-1'),('baz-2'),('baz-10');
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM foo ORDER BY bar DESC LIMIT 1;
+-------+
| bar   |
+-------+
| baz-2 |
+-------+
1 row in set (0.00 sec)

#5 @rmccue
3 years ago

  • Keywords needs-patch added; has-patch removed

Ouch. I figured MySQL would do a natural sort here, but guess not.

The other solution I can think of would be to do ORDER BY LENGTH(post_name) DESC, post_name, but a) the performance would likely suck, and b) this would fail with foo-a-b and foo-1.

Leaving open, as we should keep exploring how to improve the performance, but switching to a direct query doesn't appear to be a good solution here.

#6 @bisyarin
3 years ago

Thank you for prompt reply, @johnbillion, @pento, and @rmccue!

You are making a good poing, @pento. I didn't even think of this situation. As a quick fix for this I would suggest 'ORDER BY post_modified DESC', but it stops working if all four events (create foo, create bar, rename foo->bar, create bar) happen on the same second. So, I'll try to find another solution.

@bisyarin
3 years ago

The Second Patch of performance improving

#7 @bisyarin
3 years ago

Hello!
I've attached patches for "wp-includes/post.php" and for "tests/post/wpUniquePostSlug.php".

I decided to try @rmccue's suggestion and perform natural sort on MySQL side, selecting needed rows for sorting with RLIKE. I didn't notice serious performance issues with LENGTH or RLIKE while running the tests listed below.

Thousand Posts Creation Tests

Original trunk code:
Fresh database on each run: 329 seconds for 1000 objects (average of three runs)
With 10 000 records already in DB: ~2 hours for 1000 objects (estimation based on several minutes of work)

First patch (that works incorrectly):
Fresh database on each run: 78 seconds for 1000 objects (average of three runs)
With 10 000 records already in DB: 156 seconds for 1000 objects (one run)

Second patch:
Fresh database on each run: 79 seconds for 1000 objects (average of three runs)
With 10 000 records already in DB: 214 seconds for 1000 objects (one run)

Although the second patch works about 40% slower than the first one, it's still a nice result comparing to the original performance.

Unit tests for the issues discussed here, especially with @pento, have been created. Also I added an SQL-queries counting test.

In "phpunit\tests\post\wpUniquePostSlug.php" file, "test_embed_slug_should_be_suffixed_for_pages" test a typo has been found: paage -> page

IMPORTANT QUESTION:
The original code has a feature of filling the gaps between numeric prefixes.
For example, we have 'post', 'post-2', and 'post-4'.
In original code the next prefix is '3'. With the second patch it is '5'.

Should the behavior of filling gaps be in the code with improved performance? It's just requires PHP-side processing of all similar slug names, this is why I'm asking. Of course, this processing can be smart, but still sending all the needed slug names to PHP can slow things down. What do you think?

Last edited 3 years ago by bisyarin (previous) (diff)

#8 @bisyarin
3 years ago

  • Keywords has-patch has-unit-tests added; needs-unit-tests needs-patch removed

@bisyarin
3 years ago

Patch for post.php for faster slug generation with filling gaps approach

@bisyarin
3 years ago

Patch for wpUniquePostSlug.php with tests for faster slug generation with filling gaps approach

#9 @bisyarin
3 years ago

Hello!

If you decide to choose the original way of filling gaps in slugs numbering (it's of course slower than the second patch's way), than I've attached patches for 'src' and 'tests'. These are my tests results for this approach:

Fresh database on each run: 92.5 seconds for 1000 objects (avg of two runs)
With 10 000 records already in DB: 419 seconds for 1000 objects (average of three runs)

Note: See TracTickets for help on using tickets.