Make WordPress Core

Opened 19 months ago

Closed 18 months ago

Last modified 14 months ago

#57683 closed enhancement (fixed)

Improve performance of mysql2date

Reported by: spacedmonkey's profile spacedmonkey Owned by: spacedmonkey's profile spacedmonkey
Milestone: 6.3 Priority: normal
Severity: normal Version: 0.71
Component: Date/Time Keywords: has-patch commit
Focuses: performance Cc:

Description

The mysql2date function is used in WP_Query. In the following lines.

public function generate_postdata( $post ) {
  ...
  $currentday   = mysql2date( 'd.m.y', $post->post_date, false );
  $currentmonth = mysql2date( 'm', $post->post_date, false );

However, this results in a call to wp_timezone, which creates a new DateTimeZone and calls to wp_timezone_string which does two calls to get_option.

As mysql2date is called twice every time a post is setup, this can result in hundreds of calls to get_option / wp_timezone. This is very wasteful, as even simple calls to get_option have overhead.

As it is very unlikely for the timezone will change in a single page request, the result of the timezone should be cached somehow to stop reprocessing.

Attachments (4)

Screenshot from 2023-02-09 18-40-22.png (29.8 KB) - added by spacedmonkey 19 months ago.
Screenshot 2023-03-10 at 14.12.22.png (18.1 KB) - added by spacedmonkey 18 months ago.
Trunk
Screenshot 2023-03-10 at 14.12.31.png (17.7 KB) - added by spacedmonkey 18 months ago.
Using get_post_datetime without cache timezone
Screenshot 2023-03-10 at 14.12.41.png (17.6 KB) - added by spacedmonkey 18 months ago.
Use get_post_datetime with cached timezone

Download all attachments as: .zip

Change History (40)

#2 @Rarst
19 months ago

I pondered such before, but I am generally skeptical of "optimizing" options call. It becomes a caching system for a caching system and introduces a lot of uncertainty about getting actual option values over something stuck in a bespoke cache somewhere.

I think the safe 50% win here could be to drop mysql2date calls (the date formats are fixed so there is no appeal to its crazy input edge cases) in favor of plain PHP with a single wp_timezone() call.

Is the trace crop from a complete themed page load or truncated to just post loop or something? Would seem tad high for me if former, but usual trace breakdowns aren't on top of my head right now (Blackfire removed their free plan, not readily set up for profiling :\ ).

This ticket was mentioned in PR #4062 on WordPress/wordpress-develop by @tanjimtc71.


19 months ago
#3

  • Keywords has-patch added

This change adds an optional fourth parameter to the mysql2date function to allow passing in the timezone.
If the timezone is not passed, the function will default to the value returned by wp_timezone.

This modification should improve the performance of the mysql2date function by reducing the number of calls to wp_timezone.

I have tested the changes and they seem to be working fine. However, I would appreciate it if you could take a look and let me know if there are any issues or further changes needed.

Thanks!

Trac ticket: https://core.trac.wordpress.org/ticket/57683

#6 follow-up: @spacedmonkey
19 months ago

Have my own PR at #4169. This caches the timezone as a static variable.

@Rarst Can you take a look and see what you think?

#7 in reply to: ↑ 6 @Rarst
19 months ago

Replying to spacedmonkey:

Have my own PR at #4169. This caches the timezone as a static variable.

@Rarst Can you take a look and see what you think?

As above I am against bespoke caching for options. Options in general and definitely Date/Time options specifically, things are broken enough with it.

And that also exploded a ton of unit tests (probably because it made the function much less testable, rather than functionally broke it).

Your other PR halved option calls inside mysql2date, right? I maintain we can do low hanging optimization here which would further half (so calls down 4x compounded) and re-test. I can doodle a PR if there is a demand to deal with it sooner rather than later. :)

Maybe look at what is going with options caching/autoload after, to my memory this has never been a significant perf outlier.

#8 @Rarst
19 months ago

I was replying late, actually it would still be 2x since if the mysql2date call is dropped here then improvement within it no longer relevant. Anyway I'll tinker on it some.

Last edited 19 months ago by Rarst (previous) (diff)

#9 @spacedmonkey
18 months ago

@Rarst Do we even need to use mysql2date to here? I feel like date_create might do it just fine.

#11 @spacedmonkey
18 months ago

In #4181, I removed mysql2date and use date_create in WP_Query, as we do not care about the timezone in this context.

#12 @Rarst
18 months ago

Yeah, that was my initial suggestion above - to drop mysql2date it from here. :)

The PR looks tad wrong though, because without specifying time zone you would try to parse local time as UTC (or who knows what if end user messes with it, we are trying to proof against that in the long run). Remember - the database field doesn't contain any time zone information, so it can't be parsed out of it.

With some more thinking I am now pondering that we probably need a proper mysql2date replacement. Otherwise everywhere this (parsing post time from database) needs to be done there will be potential for diverging implementations and slip ups.

Last edited 18 months ago by Rarst (previous) (diff)

#13 @spacedmonkey
18 months ago

Why does the timezone matter when you are getting the day.month.year and month? Timezone doesn't effect these right?

#14 @Rarst
18 months ago

Does it matter in this code verbatim? Maybe not.

Does it matter that it introduces ambiguity about what time zone the operation happens in? Yes. Because that's why this component is so broken - it was all coded on the principle of situationally convenient, conceptually wrong.

If the performance preference here is to process this sans time zone, I would rather suggest to tear the input apart into strings (and document why it's done that way). Instantiating a DateTime object into knowingly invalid state is bad.

Last edited 18 months ago by Rarst (previous) (diff)

#15 @Rarst
18 months ago

My thoughts so far, drafted in code:

<?php
// "proper" (timezone dealth with implicitly)
$datetime     = get_post_datetime( $post );
$currentday   = $datetime->format( 'd.m.y' );
$currentmonth = $datetime->format( 'm' );
var_dump( $currentday, $currentmonth );

// "optimized" (handled as string, timezone-unaware)
list( $year, $month, $day ) = explode( '-', substr( $post->post_date, 0, 10 ) );
$currentday   = $day . '.' . $month . '.' . substr( $year, -2 );
$currentmonth = $month;
var_dump( $currentday, $currentmonth );
Last edited 18 months ago by Rarst (previous) (diff)

#17 @spacedmonkey
18 months ago

I have updated #4181.

I don't think that get_post_datetime is a solution, as it still has the same performance problem, of creating a new timezone object and the option lookup attached to that. I put together a PR for get_post_datetime. See #4184.

#18 @Rarst
18 months ago

I feel like this keeps circling back to "let's introduce a lot of complexity and/or side effects to not access this specific option".

Accessing options is normal. It's not particularly slow (still need to double check what is autoload thinking about time zone ones). It's not worth severe issues with testing through static caches or complicating function signatures that are already (out of backwards compat necessity) too complex.

If you are set on eliminating option access here - I suggest the string approach. It's good enough given that formats are fixed and being ancient global vars aren't likely to change around.

#19 @spacedmonkey
18 months ago

@Rarst Sorry, I am really confused here.

Here you say it important we pass handle the timezone. I do that and add caching, you then you suggest here solutions that ignore timezone completely. Which is it?

To be clear the performance issue, is calling wp_timezone so many times on a single page load. it has an overhead, I am trying to avoid here. Caching it and reusing it within the same WP_Query instance, makes sense to me. If you have a filter that runs on the get_option, it will fire the first time and not again. There is no need let the filters run every time, it is unlikely to change and just wasteful.

My preferred solution is still this.

$post_date = date_create( $post->post_date );
if ( ! $post_date ) {
        $currentday   = false;
        $currentmonth = false;
} else {
        $currentday   = $post_date->format( 'd.m.y' );
        $currentmonth = $post_date->format( 'm' );
}

#20 @Rarst
18 months ago

I know that this is confusing (welcome to Date/Time component!), please bear with me. :)

Let's break it down from scratch.

We have input of a local time in MySQL format (that doesn't include time zone information) and we need to produce a couple of formatted outputs. The formats are historically fixed and do not contain time zone information either.

The easiest way to do this semantically is to delegate the operation to a DateTime object, which is very good at both parsing date inputs and formatting date outputs. However DateTime object doesn't have a concept of ambiguous "local" time, it must instantiate with a concrete time zone, provided explicitly or taken from the runtime default.

So the operation date_create( $post->post_date ) takes a local input and doesn't provide time zone, leaving it to runtime default (which would be UTC as set by WordPress core on boot... or whatever user changed it to, because people do that and we can't stop them).

Now you have an object instantiated from local time using a time zone that does not correspond to that local time. That object is in completely invalid state and points to an altogether wrong moment in time, that is different from the original input. This is bad and should not be done.

Now, since our output in this case is coarse to a day, the empirical output will happen to be the same in this case. That does not justify this as an acceptable practice. We spent a lot of time fixing issues stemming from doing exactly this - transforming time inputs in an invalid time zone states. I will die on a hill on not having a code like this in core, because we shouldn't be ever giving this as viable example of doing such.

Given that context I see the options worked out so far are following:

  1. Use a DateTime object with a correct time zone (we can halve the instances, by not calling mysql2date twice).
  2. Use a DateTime object and cache a time zone information to reuse. In my experience bespoke caches increase complexity and decrease testability.
  3. Do not use a DateTime object. Given the specific input and outputs in this case we can do that with a bit of string manipulation.

To be clear and restate my positions:

  1. Using invalid DateTime object is a hard no from me.
  2. Introducing a bespoke cache for time zone - so far I am unconvinced the downsides are worth the performance improvement. Caching options should be generally left to its own cache, not sprinkled randomly through the core.
  3. Not using a DateTime object is fine by me in this case, given relative simplicity and closeness of input/output.

Sanity check - would you go through entire core and cache every single options access manually "for performance"? No? Then why treat this option fetch as special, it's not.

Last edited 18 months ago by Rarst (previous) (diff)

@spacedmonkey
18 months ago

Using get_post_datetime without cache timezone

@spacedmonkey
18 months ago

Use get_post_datetime with cached timezone

#21 @spacedmonkey
18 months ago

See the above screenshots. There is a benefit of using get_post_datetime. however there is even more benefit of using a cached timezone object.

As this issue, is on every post, these numbers mount up. I think we should cache the timezone, it's value will not change and there is no need to re-compute it.

There is an overhead to use wp_timezone, it is not only the cache lookup. I do entend on limiting it's use, as calling the same function over and over again is wasteful and that is what I am trying to fix with this ticket.

#22 @Rarst
18 months ago

Do you have an issue with strings approach, you keep ignoring that option?

It accomplishes what you want (gets rid of time zone look up altogether) and doesn't make life harder for me by worsening code quality and testability here.

#23 @SergeyBiryukov
18 months ago

I would prefer the strings approach from comment:15 here too. It might be a bit less readable at a glance, but achieves the optimization goal without introducing any ambiguous timezone issues, and keeps the testing flexibility.

#25 @spacedmonkey
18 months ago

I have created a PR that use the sub string solution. Seems to work well. See #4230.

Seeing a reduction of -200 calls to timezone.

#26 @Rarst
18 months ago

Cheers, thank you for understanding, I hope that satisfies everyone's concerns here. :)

Could you please add explicit inline comment that DateTime/DateTimeZone are specifically not used in this instance for performance reasons?

Otherwise someone down the road will try to optimize this backwards and we'll have to discuss all over again. :)

#27 @flixos90
18 months ago

  • Milestone changed from Future Release to 6.3

I ran a quick performance benchmark on this PR (added the code manually to WP 6.2-RC1):

Both tests were run on a home page with 2 posts, and the improvements are between 0.4-0.9ms. It is expected that the improvements would be slightly larger the more posts are on a page.

While it may seem like a small improvement, every improvement counts, and their effect adds up eventually. So together with the existing support on the approach, I think it makes sense to commit it as it definitely has a positive performance impact.

#28 @spacedmonkey
18 months ago

If that is 0.5ms per post, a page with say 30 posts, would see a much bigger improvement.

#29 @spacedmonkey
18 months ago

@Rarst @SergeyBiryukov Can I please a review of #4230.

#30 @spacedmonkey
18 months ago

  • Keywords commit added
  • Owner set to spacedmonkey
  • Status changed from new to assigned

#31 @spacedmonkey
18 months ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 55558:

Date / Time: Remove usage of mysql2date in generate_postdata method.

Remove usage of mysql2date in generate_postdata. mysql2date has a performance overhead, as it creates a DateTimeZone object each time it is called. Use a simple
sub string function to generate the values needed for the currentmonth and currentday global variables.

Props spacedmonkey, Rarst, SergeyBiryukov, flixos90.
Fixes #57683.

@spacedmonkey commented on PR #4230:


18 months ago
#32

Committed.

Rarst commented on PR #4230:


18 months ago
#33

Cheers! I only had sprintf format bit left from review. Not sure if you hadn't seen the last comment (can get lost with reviews) or disagree. Not critical. :)

#34 @milana_cap
14 months ago

  • Keywords add-to-field-guide added

Can be mentioned in Field Guide together with https://core.trac.wordpress.org/ticket/57705

Last edited 14 months ago by milana_cap (previous) (diff)

#35 @johnbillion
14 months ago

This doesn't need to be in the field guide, this is an internal optimisation.

#36 @milana_cap
14 months ago

  • Keywords add-to-field-guide removed
Note: See TracTickets for help on using tickets.