WordPress.org

Make WordPress Core

Opened 12 months ago

Last modified 8 weeks ago

#41305 assigned enhancement

Add lazily evaluated translations

Reported by: schlessera Owned by: jnylen0
Milestone: 5.0 Priority: normal
Severity: normal Version: 4.8
Component: I18N Keywords: has-patch early dev-feedback
Focuses: rest-api, performance Cc:

Description

In the context of #40988, I did a few performance tests and experimented with adding a lazily evaluated translation object.

The general principle is this:

Instead of returning the resulting string of a translation, return an object for which the __toString() and jsonSerialize() methods will fetch the resulting string instead.

I tested by having the __() method return such a proxy object, instead of the actual translated string.

From a quick profiling run on wptrunk.dev/wp-json/wp/v2/posts, I got the following results:

Returning a translate() from __():

Wall Time     162ms
CPU Time      157ms
I/O Time     5.48ms
Memory       16.5MB
Network         n/a     n/a     n/a
SQL          4.41ms    13rq

Returning a TranslationProxy from __():

Wall Time     144ms   -19ms  -14.9%
CPU  Time     138ms   -18ms  -15.4%
I/O  Time    5.33ms  -154µs   -3.0%
Memory       16.6MB +81.6KB     n/s
Network         n/a     n/a     n/a
SQL          4.33ms    13rq

As you can see, this shaved off almost 15% from this simple request.

It saved 2255 calls to translate(), 2157 calls to get_translations_for_domain() and, more importantly still, 2156 calls to apply_filters() (which could involve a lot of additional processing in some cases).

The main problem with this approach is that WordPress does not contain real type-hinting, so BC is broken wherever the proxy is not echoed, but used directly.

As we cannot possibly foresee how plugins might use their localized strings, I suggest adding new lazy variations of the translation functions. To mirror the "echo" variations that prefix the translation functions with an e, I'd suggest using the l prefix for these variations:

// Lazily retrieve the translation of $text.
_l( $text , $domain = 'default' );

// Lazily retrieve the translation of $text and escape it for safe use in an attribute.
esc_attr_l( $text, $domain = 'default' );

// Lazily retrieve the translation of $text and escape it for safe use in HTML output.
esc_html_l( $text, $domain = 'default' );

// Lazily retrieve translated string with gettext context.
_lx( $text, $context, $domain = 'default' );

// Lazily translate string with gettext context, and escape it for safe use in an attribute.
esc_attr_lx( $text, $context, $domain = 'default' );

// Lazily translate string with gettext context, and escape it for safe use in HTML output.
esc_html_lx( $text, $context, $domain = 'default' );

Arbitrary testing has shown that using such lazily evaluated translations strategically can improve the performance by 10-30% for certain scenarios.

Implementing them in this BC fashion allows us to fine-tune Core usage and make it available to plugins, while playing it safe with existing code.

Attachments (10)

41305.implementation (14.6 KB) - added by schlessera 12 months ago.
Initial implementation. This sets everything up, but does not any benefits yet because it is not being consumed by any other code.
41305.optimize-rest-api (169.5 KB) - added by schlessera 12 months ago.
Optimizes all REST API translation to be lazily evaluated.
41305.implementation.diff (14.6 KB) - added by schlessera 11 months ago.
No changes, just added .diff extension for syntax highlighting.
41305.optimize-rest-api.diff (169.5 KB) - added by schlessera 11 months ago.
No changes, just added .diff extension for syntax highlighting.
41305.optimize-rest-api.2.diff (169.3 KB) - added by schlessera 9 months ago.
Refresh of the REST API practical application
41305.all-in.diff (39.9 KB) - added by schlessera 8 weeks ago.
Direct change to default translation mechanism
WP trunk default.png (340.4 KB) - added by schlessera 8 weeks ago.
WP trunk default
WP trunk with 41305.all-in-diff applied.png (339.6 KB) - added by schlessera 8 weeks ago.
WP trunk with 41305.all-in-diff applied
WP trunk default (curl).png (186.0 KB) - added by schlessera 8 weeks ago.
WP trunk default (curl)
WP trunk with 41305.all-in-diff applied (curl).png (186.8 KB) - added by schlessera 8 weeks ago.
WP trunk with 41305.all-in-diff applied (curl)

Download all attachments as: .zip

Change History (32)

@schlessera
12 months ago

Initial implementation. This sets everything up, but does not any benefits yet because it is not being consumed by any other code.

@schlessera
12 months ago

Optimizes all REST API translation to be lazily evaluated.

#1 @schlessera
12 months ago

  • Keywords has-patch added

I've now uploaded an initial implementation, as well as a separate patch that replaces all translated strings in the REST API to make use of these lazy-loading proxies, as it think it should be relatively safe to do so for the REST API.

This needed two small tweaks to the unit tests, where the tests were assert the actual internal type of string on error messages. I don't think that the probability is very high that actual code should hit that issue.

This relatively conservative change reduced the wall time on my benchmarks (with 50 iterations) from 162ms to 152 ms, by eliminating 1236 translations.

Of course there's lots of other potential places where we can apply these optimizations as well. But I don't want to pollute this ticket too much.

#3 @jdgrimes
12 months ago

Ultimately, I see the bigger underlying problem as being a design flaw in many of the APIs where the translation functions are used. The ideal would have been to design the code so that it did not have to worry about the translations at all until it actually needed them. However, that is now water under the bridge, and although we could take this into account when designing future code, introducing these lazy functions is probably worth the immediate performance gains.

A better design of the code to start with would likely also decrease memory, whereas the lazy approach actually increases it overall in some cases, although it appears that the memory increase is reasonably small based on the numbers in the OP. I assume that memory consumption would still be only minimally more than it presently is even if the lazy functions were used by most plugins, but that might also be worth some consideration and investigation. In cases where a text domain did not have to be loaded at all (i.e., the translations were never actually used), memory might actually be decreased.

#4 @schlessera
12 months ago

@jdgrimes Yes, I totally agree that a better upfront design would be preferable. But that would mean ripping everything out and breaking the entire ecosystem, pretty much...

This approach is a sort of "opt-in" improvement, which lets us fine-tune the usage by hand without breaking anything, and lets plugins/themes make use of this where it makes sense.

#5 @rmccue
12 months ago

  • Focuses rest-api added

Rather than being too clever with objects, we could potentially just introduce _noop() and translate_noop functions instead, to match the existing _n_noop and translate_nooped_plural. The __noop() is just a static analysis hint, and the actual translation can be deferred until later; we can implement this for the API (the only place we really need it) pretty easily, it just requires us to add the functions to the static analysis.

This would look something like:

public function get_item_schema() {
	// ...
	'translation_domain' => '',
	'properties' => array(
		// ...
		'description' => _noop( "The date the object was published, in the site's timezone." ),
	)
	// ...
}

Later, we then call translate_noop( $description, $translation_domain ) when needed.

This is less surprising to consumers of the function, since it's still actually a string. One of the problems with PHP is that string-like objects aren't equivalent to first-class strings, so introducing an object with __toString() might introduce problems.

In an ideal world, we would have implemented the API so that things like this could either be a string or a function, allowing us to defer the actual runtime of translation:

'description' => function () {
	return __( "The date the object was published, in the site's timezone." );
}

However PHP doesn't fit well with this pattern generally, and we also don't have proper closures in PHP 5.2, so we can't really do this.

#6 @schlessera
12 months ago

I don't consider this "too clever", as I specifically introduced new methods instead of changing old ones, and they are documented to return a proxy object. Using proxy objects is a standard practice, and developers are free to not use these new methods if they don't understand what that means.

What you propose means that you split the translation code up into two different locations for strings translated in this way. For more complex code, you'd basically need to run your strings through a check to make sure that translatable strings was indeed translated. I don't understand how this would be preferable.

There are probably use cases where this makes sense, but as a general mechanism, I think this makes the code even worse, and introduces lots of bugs where we forgot to actually translate a translatable strings. This won't even be detected by any of our automated tests. With the proxy object above, you at least get an immediate error if you handle it incorrectly.

#7 @swissspidy
11 months ago

#38643 is an area where something like this would have been useful. For #38218 specifically I suggested using a new class.

One thing that bugs me about this patch here is that it would make transitioning to a new translation system (still dreaming of MessageFormat) much harder because of all the new functions and classes.

I like keeping it simple with _noop() and translate_noop().


@schlessera Would you mind using .diff file extensions next time so we have syntax highlighting here in Trac? Thanks :-)

#8 @schlessera
11 months ago

One thing that bugs me about this patch here is that it would make transitioning to a new translation system (still dreaming of MessageFormat) much harder because of all the new functions and classes.

I think that this should not be an issue, because the added functions are basically lazy wrappers around translate(). They don't do any translation themselves, and when you would change the backend implementation for translate(), theses lazy variations are taken care of automatically.

I like keeping it simple with _noop() and translate_noop().

I don't think that's simpler. It is simpler code in the implementation, but then makes every single usage of it more complex as a result, as you are splitting up the usage into two separate steps, and are making this much more error-prone.

Would you mind using .diff file extensions next time so we have syntax highlighting here in Trac? Thanks :-)

Uh, sure, I wasn't aware of that distinction. Had mostly avoided Trac before... ;)

@schlessera
11 months ago

No changes, just added .diff extension for syntax highlighting.

@schlessera
11 months ago

No changes, just added .diff extension for syntax highlighting.

#9 @jnylen0
11 months ago

  • Milestone changed from Awaiting Review to 4.9
  • Owner set to jnylen0
  • Status changed from new to assigned

I think something like this is badly needed (/wp/v2?context=help likely has even worse performance), and the approach in the patches so far is reasonable.

I'll plan to take a closer look for 4.9.

This ticket was mentioned in Slack in #core by jeffpaul. View the logs.


9 months ago

This ticket was mentioned in Slack in #core by jeffpaul. View the logs.


9 months ago

This ticket was mentioned in Slack in #core-restapi by schlessera. View the logs.


9 months ago

#13 @JPry
9 months ago

  • Keywords needs-refresh added

I think the entire premise of this ticket is great. Translations are the kind of thing that are ubiquitous within WordPress, and yet they definitely need to be more performant. I'm +1 for adding the lazy functions and translation proxies that @schlessera proposed. I could foresee using these functions within personal projects to write more performant code.

When looking at this patch locally, it seems that it no longer applies cleanly to trunk, so it should be refreshed.

This ticket was mentioned in Slack in #core-restapi by kadamwhite. View the logs.


9 months ago

@schlessera
9 months ago

Refresh of the REST API practical application

#15 @schlessera
9 months ago

  • Keywords needs-refresh removed

I refreshed the patch that applies the lazily evaluated translations to the REST API, so that it applies cleanly again.

The actual tranlsation code is still good and doesn't need a refresh.

All tests are passing.

Thanks @JPry for testing the patches!

This ticket was mentioned in Slack in #core by jeffpaul. View the logs.


9 months ago

#17 @kadamwhite
9 months ago

  • Keywords early added
  • Milestone changed from 4.9 to Future Release

As much as I'd like this to land, after discussion with @swissspidy and @rmccue we'd like more review from @ocean90 than we have time for today—punting to early in the next release cycle.

@schlessera
8 weeks ago

Direct change to default translation mechanism

#18 @schlessera
8 weeks ago

As a follow-up to the discussions I had with some of the core folks at WCUS, I produced a new patch 41305.all-in.diff that skips producing alternate translation functions, and immediately changes the default translation functions to return lazily-translated proxies instead.

Why not to play it safe (as the first patch does)?

The reasoning behind this is that this change has such a big impact on performance that we shouldn't start with a compromise that only makes it opt-in. We should properly investigate the expected amount of problems this might cause to plugins and mitigate any breakage to the best of our abilities, as this will benefit all WordPress users in a substantial way in the long term. The discussions at WCUS led to the conclusion that these are probably only edge cases and that the change is important enough to warrant contacting a few plugin authors that might be impacted.

I added some minimal edge case handling code to make sure unit tests still pass and plugins would just work. However, right now, there are still 7 unit tests that are breaking. For these, I think the unit tests might need to be adapted instead, though.

What are the breaking changes?

The translations are not real strings anymore, but proxy objects that behave like strings whenever being cast to a string (like when being echoed, concatenated, etc...). For most of the intended usage of translations (sending them to templates to get rendered into HTML responses), this just works without issues. However, three scenarios can cause issues with this:

1. Directly checking the type of the translation

This mainly happens in unit tests. That's why I created an override in the WP_Unit_TestCase for the assertEquals(), assertSame() and assertInternalType() methods. Creating such overrides also fixes breaking tests for any plugins that base their tests on WP_UnitTestCase without needing further adaptations.

2. Using the translation as an array index

This is actually done by Core in places like the locale setup. I think that there are not that many use cases for doing so, as you'd prefer deterministic indexes most of the time. The solution to this is to just cast to (string) before actually using the translation as the index.

3. Making the result of the translation dependent on the timing of its instantiation, instead of its usage

I would be surprised to learn that plugins would ever use translations in this way. However, Core unit tests currently do, and this also includes the 7 unit tests that are currently still failing.

The failing unit tests do something like this: add_filter(), __(), remove_filter(), assert(). So, they immediately remove the change they want to test before asserting. They are all part of the Tests_L10n_loadTextdomainJustInTime suite, and I'm not sure they need fixing. If I understand correctly what they are testing, I think the tests should just be adapted to move the remove_filter() calls after the assert() instead of in front of it. I would love to get more insight into this from @ocean90 & @swissspidy, as I think they had collaborated on this feature & tests.

Current results

What follows is a couple of screenshots to show the current results. The tests were done as a call to <domain>/wp-json/wp/v2/posts/1 to retrieve the JSON representation of the Hello World default post. They have been produced on my local system running PHP 7.1.17 with xdebug enabled. I kept xdebug enabled because it makes everything slower, so the change is more obvious. The setup was a fresh and default WP setup with fr_FR as frontend language and de_DE as user language.

Also, please note that the screenshots represent 1 random run, chosen to be somewhat representative. I will add real benchmark results (with averaged values) as soon as I manage to properly configure my system to run them again, as I had to find out this currently does not work.

Wall time without this patch: ~1.6s

Wall time with this patch: ~1.1s

You can also see that the query time is practically unchanged, so this is pure processing logic to make the translations happen (which are not even being used in this case).

@schlessera
8 weeks ago

WP trunk default

@schlessera
8 weeks ago

WP trunk with 41305.all-in-diff applied

#19 @schlessera
8 weeks ago

  • Keywords dev-feedback added
  • Milestone changed from Future Release to 5.0

#20 @swissspidy
8 weeks ago

They are all part of the Tests_L10n_loadTextdomainJustInTime suite, and I'm not sure they need fixing. If I understand correctly what they are testing, I think the tests should just be adapted to move the remove_filter() calls after the assert() instead of in front of it.

The statements are ordered that way so everything gets cleaned up immediately when not being used anymore (e.g. in case of errors) and assertions are nicely grouped together to make things easier to grasp. Shouldn't be a big deal to re-order those.

#21 @schlessera
8 weeks ago

I think the way I profiled the command above does mess somewhat with the processing.

The following screenshots show another way of timing the changes, using curl to retrieve the same REST API request.

Wall time without this patch: ~0.49s

Wall time with this patch: ~0.35s

@schlessera
8 weeks ago

WP trunk default (curl)

@schlessera
8 weeks ago

WP trunk with 41305.all-in-diff applied (curl)

#22 @schlessera
8 weeks ago

@swissspidy : Ah, thanks, I already thought so but wanted to make sure I didn't just misunderstand the point of the tests.

Note: See TracTickets for help on using tickets.