WordPress.org

Make WordPress Core

Opened 7 years ago

Last modified 2 years ago

#22951 assigned defect (bug)

Performance enhancements for esc_url()

Reported by: markjaquith Owned by: schlessera
Milestone: Future Release Priority: normal
Severity: normal Version: 2.8
Component: Formatting Keywords: reporter-feedback has-patch 2nd-opinion
Focuses: performance Cc:
PR Number:

Description

esc_url() gets used a lot on WordPress admin pages. Sometimes 100 times or more. Nacin did some KcacheGrind measurements that had it as 7% of some pages. We can speed it up.

Most of the grind comes from wp_kses_bad_protocol().

I had a thought that we're sort of going about things backwards. We're being very careful to exclude anything harmful — bad characters, bad protocols, duplicate fake-out protocols, etc. But almost 100% of the time, the URL going through it is a http/https URL that has completely normal characters in it. We can detect that really early, and bail.

I did some tests with this approach that showed a good time savings.

Quite obviously, there's no room to compromise on security, so we'll need to be watching unit test, and maybe even writing some new ones for good measure.

Attachments (2)

improve-wp-kses-bad-protocol-performance.22951.diff (1.8 KB) - added by schlessera 3 years ago.
22951.2.diff (2.2 KB) - added by schlessera 2 years ago.
Refreshed the patch and made changes to fix broken tests

Download all attachments as: .zip

Change History (21)

#1 @nacin
7 years ago

  • Milestone changed from Awaiting Review to 3.6

I think this would be great for 3.6.

#2 @adamsilverstein
7 years ago

  • Cc ADAMSILVERSTEIN@… added

#3 @mboynes
7 years ago

  • Cc mboynes added

#4 @alex-ye
6 years ago

  • Cc nashwan.doaqan@… added

#5 @sunnyratilal
6 years ago

  • Cc sunnyratilal5@… added

#6 @ryan
6 years ago

  • Milestone changed from 3.6 to Future Release

#7 @nacin
6 years ago

  • Component changed from Performance to Formatting
  • Focuses performance added

#8 @johnbillion
5 years ago

  • Keywords needs-patch reporter-feedback added
  • Version set to 2.8

Did you ever write a patch for this Mark?

#9 @johnbillion
4 years ago

  • Milestone changed from Future Release to Awaiting Review
  • Owner set to markjaquith
  • Status changed from new to assigned

This ticket was mentioned in Slack in #core by peterwilsoncc. View the logs.


3 years ago

#11 @peterwilsoncc
3 years ago

  • Milestone changed from Awaiting Review to 4.7
  • Owner changed from markjaquith to schlessera

@schlessera assigning to you as requested ands putting on the 4.7 milestone. It can always be moved off if the security stuff requires additional testing.

This ticket was mentioned in Slack in #core by desrosj. View the logs.


3 years ago

#13 @desrosj
3 years ago

  • Keywords needs-unit-tests added

This has a solid plan from @markjaquith, needs a patch and some unit tests.

#14 @peterwilsoncc
3 years ago

@schlessera did you get a chance to look at this? It's been sitting around for a while so we can postpone if not.

thanks.

#15 @peterwilsoncc
3 years ago

  • Milestone changed from 4.7 to Future Release

#16 @schlessera
3 years ago

  • Keywords has-patch 2nd-opinion added; needs-patch needs-unit-tests removed

@peterwilsoncc Sorry for not getting back to you sooner.

I had some time during a contributor day and made a first attempt at improving performance. Basically, I just add a regex in front of the bad protocol detection to return early for http(s) URLs. I'm sure there might be other performance enhancements at other (earlier) points in the code execution, but these will also have more potential impact on the security implications than what I've currently done.

With the change in place, the existing unit tests in group kses still pass successfully.

The regex I've taken is the @diegoperini one (with a slight modification to remove ftp) from the comparison here: https://mathiasbynens.be/demo/url-regex . It is pretty long and feature-complete, but due to the way the regex is evaluated, it is actually fast enough to provide a real improvement.

I've done some (not 100% scientific) benchmarks to compare the current version of esc_url from trunk against my modified version. For all benchmarks, both versions run on the same machine from within the same process, and only the relative change is stored, to get rid of fluctuations due to allocations of server resources as good as possible. You can see a table with the benchmark results here: https://docs.google.com/spreadsheets/d/1XK5lgAIDMlj6lQSTrelXdD5eh6wAqJuLMS1605iM618/edit?usp=sharing

The benchmark summary: In a normal use case, this change should speed up the esc_url function by 25-30%. In a best case scenario, the speed-up can be as high as 40%, while the absolute worst case scenario (basically, dealing with 100% bad protocol links) produces a slow-down of 2%, which I deem is acceptable given the very low probability of this ever being the case.

The test script I've used to run the benchmarks can be found here: https://gist.github.com/schlessera/d4dd4db71d59dfee6ee6c5b9a0ca8a33

This ticket was mentioned in Slack in #core by schlessera. View the logs.


3 years ago

#18 @desrosj
3 years ago

This patch still applies cleanly, but I am getting some unit test failures while testing.

@schlessera
2 years ago

Refreshed the patch and made changes to fix broken tests

#19 @schlessera
2 years ago

The test failures were due to two issues:

  • the protocol is normalized to lowercase by the full protocol check
  • http needs to be rejected when only https is allowed, and vice versa.

Both problems should be solved.

Note: See TracTickets for help on using tickets.