Make WordPress Core

Opened 3 years ago

Closed 2 years ago

Last modified 2 years ago

#55069 closed enhancement (fixed)

Optimize fread() calls using file_get_contents() or stream_get_contents()

Reported by: maxkellermann's profile maxkellermann Owned by: sergeybiryukov's profile SergeyBiryukov
Milestone: 6.0 Priority: normal
Severity: normal Version:
Component: General Keywords: has-patch
Focuses: performance Cc:

Description (last modified by SergeyBiryukov)

The POMO_FileReader::read_all() method uses fread() with 4 kB blocks, but stream_get_contents() would be much faster.

Change History (28)

This ticket was mentioned in PR #2267 on WordPress/wordpress-develop by MaxKellermann.


3 years ago
#1

  • Keywords has-patch added

stream_get_contents() is faster, because the PHP core can decide how
to best read the reamining file; it could decide to issue just one
read() call or mmap() the file first.

The old chunk size of 4 kB was rather clumsy, because it is smaller
than PHP's default stream buffer size of 8 kB.

Trac ticket: https://core.trac.wordpress.org/ticket/55069

#2 @SergeyBiryukov
3 years ago

  • Focuses performance added
  • Milestone changed from Awaiting Review to 6.0

Hi there, welcome to WordPress Trac! Interesting, thanks for the patch.

Introduced in [12174] / #10165 for WordPress 2.9. At the time, WordPress still needed to support PHP 4.3+. I guess that was the reason for not using stream_get_contents() here, as it's only available in PHP 5+.

Last edited 3 years ago by SergeyBiryukov (previous) (diff)

#3 @maxkellermann
3 years ago

I just submitted a related PR to the PHP project: https://github.com/php/php-src/pull/8032
This PHP PR optimizes the stream_get_contents() implementation to skip buffering in some cases, which allows slurping the whole file in just one read() system call instead of refilling the read buffer in 8 kB steps.

#4 @maxkellermann
3 years ago

I have added 4 more patches with micro-optimizations.

#5 @johnbillion
3 years ago

Thanks for the patch @maxkellermann . Do you have some before and after numbers on the performance improvement?

#6 @maxkellermann
3 years ago

No, I don't have numbers for these specific patches, but the improvements should be similar to the numbers I posted on https://github.com/php/php-src/pull/8032 because they allow WordPress to take advantage of these optimizations.

However, the WordPress improvement will be very small until my PHP patches are merged, because PHP's own internal functions are just as unoptimized as WordPress's custom implementation. Only the combination of the two is much faster.

#7 @SergeyBiryukov
3 years ago

  • Owner set to SergeyBiryukov
  • Status changed from new to accepted

#8 @SergeyBiryukov
3 years ago

In 52696:

Code Modernization: Use stream_get_contents() in POMO_FileReader::read_all().

stream_get_contents() is faster than fread(), because the PHP core can decide how to best read the remaining file; it could decide to issue just one read() call or mmap() the file first.

Per the PHP manual, file_get_contents() or stream_get_contents() is the preferred way to read the contents of a file into a string. It will use memory mapping techniques if supported by the OS to enhance performance.

Reference: PHP Manual: file_get_contents().

Follow-up to [12174].

Props maxkellermann.
See #55069.

#9 @SergeyBiryukov
3 years ago

In 52698:

Code Modernization: Use file_get_contents() in get_file_data().

file_get_contents() is faster than fread(), because the PHP core can decide how to best read the remaining file; it could decide to issue just one read() call or mmap() the file first.

Per the PHP manual, file_get_contents() or stream_get_contents() is the preferred way to read the contents of a file into a string. It will use memory mapping techniques if supported by the OS to enhance performance.

Reference: PHP Manual: file_get_contents().

Follow-up to [12044], [49073], [52696].

Props maxkellermann.
See #55069.

#10 @SergeyBiryukov
3 years ago

In 52701:

Code Modernization: Use file_get_contents() in wp_get_image_mime().

file_get_contents() is faster than fread(), because the PHP core can decide how to best read the remaining file; it could decide to issue just one read() call or mmap() the file first.

Per the PHP manual, file_get_contents() or stream_get_contents() is the preferred way to read the contents of a file into a string. It will use memory mapping techniques if supported by the OS to enhance performance.

Reference: PHP Manual: file_get_contents().

Follow-up to [50810], [52696], [52698].

Props maxkellermann.
See #55069.

#11 @SergeyBiryukov
3 years ago

In 52718:

Code Modernization: Use file_get_contents() in wp_get_webp_info().

file_get_contents() is faster than fread(), because the PHP core can decide how to best read the remaining file; it could decide to issue just one read() call or mmap() the file first.

Per the PHP manual, file_get_contents() or stream_get_contents() is the preferred way to read the contents of a file into a string. It will use memory mapping techniques if supported by the OS to enhance performance.

Reference: PHP Manual: file_get_contents().

Follow-up to [50810], [52696], [52698], [52701].

Props maxkellermann.
See #55069.

#12 @SergeyBiryukov
3 years ago

Thanks for the patches!

Looks like 0002-class-wp-filesystem-direct-remove-unnecessary-is_dir.patch is the only one left. My concerns here:

  • The ::is_dir() and ::is_readable() methods call the is_dir() and is_readable() PHP functions, respectively, with the error suppression operator (@). The subsequent dir() call does not have it. If we remove these checks, would that lead to PHP warnings on some installs which had none previously, for example due to open_basedir restrictrion? Seems like this could use some testing.
  • The WP_Filesystem_SSH2::dirlist() method has a similar block, does it need the same change?

#13 follow-up: @maxkellermann
3 years ago

would that lead to PHP warnings on some installs which had none previously

You are right. It seems PHP people very much like the idea of sprinkling unnecessary access() and stat() system calls everywhere.

I'm not a PHP programmer; I do lower-level code, and one very simple idea to make code faster is to omit stuff that's not necessary. For example, what's the point of doing access() to check if a file exists before opening it; just open() it and then you can still handle the same ENOENT. Doing system calls has become more expensive recently due to the Meltdown/Spectre mitigations, and avoiding them does make a difference.

Additionally, the extra access()/stat() is buggy; it's a TOCTOU bug. It's not only slower, but also unreliable.

Knowing that, it is surprising that PHP punishes fast and reliable code with a warning, which is a very visible signal to their users: this software is bad. That's sad.

I can understand if you don't want to merge my remaining patch for that reason. But if you do want it, I'll investigate what else can be optimized that way and send more patches.

Maybe this kind of warning can be suppressed? I don't know, I'm not a PHP expert.

#14 in reply to: ↑ 13 ; follow-up: @SergeyBiryukov
3 years ago

Replying to maxkellermann:

But if you do want it, I'll investigate what else can be optimized that way and send more patches.

Please do :)

Maybe this kind of warning can be suppressed?

Yes, on the second thought, we can just add error suppression for the dir() call.

Version 0, edited 3 years ago by SergeyBiryukov (next)

#15 in reply to: ↑ 14 @SergeyBiryukov
3 years ago

Replying to SergeyBiryukov:

Yes, on second thought, we can just add error suppression for the dir() call.

Hmm, it looks like is_readable() was added in [45611/trunk/src/wp-admin/includes/class-wp-filesystem-direct.php] specifically to remove error suppression from the dir() call, so the patch reverts that part of the changeset. It also causes a WPCS warning:

FILE: src\wp-admin\includes\class-wp-filesystem-direct.php
----------------------------------------------------------------------
FOUND 0 ERRORS AND 1 WARNING AFFECTING 1 LINE
----------------------------------------------------------------------
 609 | WARNING | Silencing errors is strongly discouraged. Use proper
     |         | error checking instead. Found: @dir( $path ...
     |         | (WordPress.PHP.NoSilencedErrors.Discouraged)
----------------------------------------------------------------------

So we would need a // phpcs:ignore comment here.

See 0002-class-wp-filesystem-direct-remove-unnecessary-is_dir.2.patch.

#16 @SergeyBiryukov
3 years ago

  • Component changed from I18N to General
  • Description modified (diff)
  • Summary changed from Optimize POMO_FileReader.read_all() using stream_get_contents() to Optimize fread() calls using file_get_contents() or stream_get_contents()

Just renaming the ticket as the focus has moved to other parts of core as well.

#17 @peterwilsoncc
2 years ago

Given it's a microoptimisation, I'd rather avoid additional sniff suppression and maintain the is_readable():

  • if the sniff is wrong/too strict that ought to be discussed
  • with recent PHP changes, avoiding the STHU operator seems wise

If anything it's probably the methods being called that needs to change but that's a discussion for another ticket.

#18 @costdev
2 years ago

  • Milestone changed from 6.0 to 6.1

#19 @SergeyBiryukov
2 years ago

  • Milestone changed from 6.1 to 6.0
  • Resolution set to fixed
  • Status changed from accepted to closed

Given that 0002-class-wp-filesystem-direct-remove-unnecessary-is_dir.2.patch is the only patch left here and it's not going to be addressed at this time per comment:17, closing this as fixed for 6.0.

Please create a new ticket for any follow-up changes. Thanks!

Note: See TracTickets for help on using tickets.