Make WordPress Core

Opened 4 years ago

Closed 23 months ago

Last modified 23 months ago

#49412 closed feature request (fixed)

Store filesize media metadata

Reported by: cybr's profile Cybr Owned by: spacedmonkey's profile spacedmonkey
Milestone: 6.0 Priority: normal
Severity: normal Version:
Component: Media Keywords: has-dev-note
Focuses: performance Cc:

Description

Previous: #33214, comment:6:ticket:47459

When storing media files, I think it'd be helpful that the result of filesize( $file ) is stored with the media's metadata. Whereafter we can retrieve it via wp_get_attachment_metadata().

This way, we no longer have to touch the file and can work from the database-provided metadata instead. There's performance overhead involved when media is stored on slow/shared storage or even on another server.

Attachments (3)

49412.diff (12.2 KB) - added by johnwatkins0 4 years ago.
49412.2.diff (12.2 KB) - added by johnwatkins0 4 years ago.
Removed a line of debug code
49412.3.diff (22.3 KB) - added by adamsilverstein 2 years ago.

Download all attachments as: .zip

Change History (69)

#1 @SergeyBiryukov
4 years ago

  • Keywords needs-patch added
  • Milestone changed from Awaiting Review to Future Release

#2 @SergeyBiryukov
4 years ago

For some context, WordPress does read the size from the filesize meta value in a few places, but it's only set for audio and video files from the data extracted by getID3() library. Related: comment:4:ticket:33214.

Some history:

It would indeed be beneficial to store the file size in metadata for any attachment.

#3 @johnwatkins0
4 years ago

I've been working on this. I agree it's a useful idea. I've seen and created plenty of workarounds for the fact that filesize data isn't already saved with images. It's especially tricky where files are stored remotely.

The only question I have is whether we think it would be useful to have the file size for all intermediate image sizes as well as the full-sized image, or just for the full-sized image?

I think it makes sense to store all of them, since the main reasons I can think of for having this data involve comparing filesizes -- e.g., when looping through images to find the largest one under a limit.

The drawbacks to storing all are (1) more filesize calls at the time an image is uploaded, and (2) inflated metadata.

I'm going ahead with the assumption that we want all sizes.

Last edited 4 years ago by johnwatkins0 (previous) (diff)

#4 @johnwatkins0
4 years ago

Attaching a diff that adds "filesize" to the metadata for all attachments. For images, the field is at the top level as well as in all of the sizes in the sizes array -- e.g.:

Array
(
    [width] => 426
    [height] => 640
    [file] => 2020/02/2004-07-22-DSC_0007.jpg
    [filesize] => 87348
    [sizes] => Array
        (
            [medium] => Array
                (
                    [file] => 2004-07-22-DSC_0007-200x300.jpg
                    [filesize] => 26505
                    [width] => 200
                    [height] => 300
                    [mime-type] => image/jpeg
                )

            [thumbnail] => Array
                (
                    [file] => 2004-07-22-DSC_0007-150x150.jpg
                    [filesize] => 19138
                    [width] => 150
                    [height] => 150
                    [mime-type] => image/jpeg
                )

        )

    [image_meta] => Array
        (
            ... etc.
        )

)

_wp_attachment_metadata will be created as needed -- e.g.:

Array
(
    [_wp_attached_file] => Array
        (
            [0] => 2020/02/2019-my-file.pptx
        )

    [_wp_attachment_metadata] => Array
        (
            [0] => a:1:{s:8:"filesize";i:8419858;}
        )

    [_edit_lock] => Array
        (
            [0] => 1582035593:1
        )

)

Video and audio files will still have their filesize read by getID3() if possible, but there is a fallback if for whatever reason getID3() doesn't provide the filesize.

Question

The key in the meta array should be carefully considered. I'm using filesize because it matches the PHP function and the convention I've seen in plugins and themes that store file size in meta. There's no reason it has to be this, though. If there's a more semantic key -- or perhaps one that is sure to be unique and not conflict with plugins that may hook into attachment metadata -- then it could easily be changed. Any thoughts?

Last edited 4 years ago by johnwatkins0 (previous) (diff)

@johnwatkins0
4 years ago

@johnwatkins0
4 years ago

Removed a line of debug code

#5 @johnwatkins0
4 years ago

  • Keywords has-patch added; needs-patch removed

#6 @SergeyBiryukov
3 years ago

#51650 was marked as a duplicate.

#7 @spacedmonkey
3 years ago

I just reviewed @johnwatkins0 patch. It look great to me.

The place that I feel was missed was the _wp_image_meta_replace_original function.

Otherwise, this patch is good to me.

As I noted in #51650

Many popular plugins like Yoast, ACF and Jetpack, use the function filesize to get the file size of a attachment.

So I would love to see this land in core.

#8 @swissspidy
3 years ago

since filesize can return false, any objections to casting the return value to an int (so false becomes 0)?

#9 @spacedmonkey
3 years ago

since filesize can return false, any objections to casting the return value to an int (so false becomes 0)?

+1 to this.

Also, this change takes effect on new / updated attachments, should we try to backfill this data on existing attachments?

#10 @johnwatkins0
3 years ago

Good suggestions. I'll make an update this week.

This ticket was mentioned in Slack in #core-media by spacedmonkey. View the logs.


3 years ago

This ticket was mentioned in PR #671 on WordPress/wordpress-develop by johnwatkins0.


3 years ago
#12

  • Keywords has-unit-tests added

johnwatkins0 commented on PR #671:


3 years ago
#13

The remaining test failures seem to be unrelated to my change. Will check back tomorrow.

#14 @johnwatkins0
3 years ago

@spacedmonkey @swissspidy I have updated via the PR linked above. Unexpectedly, I discovered that filesize was returning slightly different results in the different GHA test environments. I struggled with this a bit and ended up creating a test utility function to round file sizes in the tests to the nearest thousand bytes. I think this is okay in principle because we don't care about the exact sizes of the test files as long as the actual and expected results generally match, but I still wonder if there might be a simpler approach. LMK if you have any suggestions.

There remain some test failures on my PR, but I see the same failures on the latest commit to master. I'll check back on this tomorrow.

#15 @Cybr
3 years ago

Unexpectedly, I discovered that filesize was returning slightly different results in the different GHA test environments.

Perhaps clearstatcache will do the trick before obtaining the filesize; although, I'm not sure if this is needed outside a testing environment.

#16 @johnwatkins0
3 years ago

On revisiting, I was able to work around that whole issue by simply recalculating all the file sizes on the fly within the tests: https://github.com/WordPress/wordpress-develop/pull/671/commits/1773075b444655be65022d6371c9a2beafa180c0
No need for rounding.

Last edited 3 years ago by johnwatkins0 (previous) (diff)

spacedmonkey commented on PR #671:


3 years ago
#17

💡 Idea

How about we add a new function called wp_filesize

function wp_filesize( $path ) {
     $size = (int) filesize( $path );
     return apply_filters( 'wp_filesize', $size, $path );
}

This has a number of benefits.

  • Saves doing casting to int everywhere.
  • Has a filter.
  • Improves support for plugins like S3 plugins that outsource files to third party file store.
  • We can add a file_exists check in this function to stop errors.

Props for the original idea @garretthyder

### Existing uses of key in core.

It is also worth noting that a couple of places in core already expect this key to exists.
See
https://github.com/WordPress/wordpress-develop/blob/7303470ecd4f736d13e354998ec885689fe5ec2e/src/wp-admin/includes/media.php#L3100
https://github.com/WordPress/wordpress-develop/blob/33caf61b8ba988da73c2e210309566f9fdb039be/src/wp-includes/media.php#L3271

These are two places we are going to have to test. But I wonder if we should trigger a meta data regeneration if the meta key doesn't exist in these places, so the next time, this screen is called, it has a "cached" version in post meta.

#18 @spacedmonkey
3 years ago

@johnwatkins0 PR is looking great. Added my thoughts here.

#19 @spacedmonkey
3 years ago

  • Owner set to johnwatkins0
  • Status changed from new to assigned

johnwatkins0 commented on PR #671:


3 years ago
#20

How about we add a new function called wp_filesize

Great idea, @spacedmonkey. I'll add this later today.

a couple of places in core already expect this key to exists.

Interesting -- this makes me curious why those are there. I'll take a closer look.

johnwatkins0 commented on PR #671:


3 years ago
#21

Hey, @spacedmonkey, I've pushed an update implementing your suggestion, let me know what you think.

This ticket was mentioned in Slack in #core-media by spacedmonkey. View the logs.


3 years ago

#23 @desrosj
3 years ago

Would it be beneficial to add caching for wp_filesize()? I don't think the metadata will be backfilled to attachments unless wp_generate_attachment_metadata() is called for that file, or they are run through thumbnail regeneration. having wp_cache_get()/wp_cache_add() could prevent the need for filesize() calls there.

#24 follow-up: @joemcgill
3 years ago

Good seeing this effort move forward. For context, this idea has come up before as part of #36477 and also as a part of @azaozz's general optimization efforts (see Slack conversation).

One of the questions that has come up in the past (which I can't find record of, for some reason) is how to sites that rely on plugins or services that optimize images
in the background after they're uploaded so that WordPress isn't relying on inaccurate filesize data saved in metadata.

Reviewing progress here, I like the idea of a wp_fileszie() wrapper and agree with @desrosj that adding caching to this function could help reduce the load on direct reads of the filesystem. For the benefit of sites that don't have a persistent object cache in place, I wonder if using transients for this would be useful and if doing so would reduce the need for explicit attachment metadata since wp_filesize() could check for a cached value before executing a filesize read?

Related: https://core.trac.wordpress.org/ticket/36477

#25 in reply to: ↑ 24 @azaozz
3 years ago

  • Keywords needs-refresh added

Replying to joemcgill:

One of the questions that has come up in the past (which I can't find record of, for some reason) is how to sites that rely on plugins or services that optimize images in the background after they're uploaded so that WordPress isn't relying on inaccurate filesize data saved in metadata.

Yeah that's the main drawback. Seems the best option is to rely on these plugins to do-the-right-thing and update the file size after post-processing.

I wonder if using transients for this (caching of filesize) would be useful and if doing so would reduce the need for explicit attachment metadata since wp_filesize() could check for a cached value before executing a filesize read?

That will solve the problem with plugins not updating image meta after post-processing images, and with missing filesize. However transients should not be saved/updated from the front-end. There is the rule to never write to the DB from a front-end request for unidentified users. That can bring down busy sites quite easily.

There are other cases where images are stored remotely. Then updating the meta after post-processing or trying to get the file size at runtime may not work.

As far as I see the "assumptions" for the filesize meta would be that:

  1. It's optional, does not exist for older files.
  2. May be incorrect.

Would it be useful to have it? Perhaps yes, depends... :) One use might be to drop image sub-sizes when generating srcset. Another is to speed up loading of the media modal/library.

Any other usage that would still be acceptable after considering the above two points?

#26 @joemcgill
3 years ago

However transients should not be saved/updated from the front-end

Agreed. My assumption was that we would only be priming this cache during image manipulation or via a background task, but good to make this expectation explicit.

#27 @spacedmonkey
3 years ago

It is worth noting my personal interest in this ticket.

I noticed that in my logs, I was getting lots of calls S3 api. It turns out when looking at assets in the media library, the filesize lookup, resulted in an API request for every file. As filesize is displayed in media library in multiple places, these request rack up. It is also worth noting, if assets on being offloaded in other ways like jetpack, it is likely that this will also be an issue. Even assets stored "Local" on a network share, but have a slow response to get filesize call.

Add a filter to get filesize would be extremely useful for anyone loading assets. I think that wp_fileszie() maybe useful in other parts of core, even if it was just used to help mocking in unit tests.

As for if the data is incorrect, that is up to developer to fix, but if they not fix filesize after processing, it likely that other metadata is wrong as well.

#29 @azaozz
3 years ago

After more discussion in Slack (https://wordpress.slack.com/archives/C02SX62S6/p1620500571129800), the requirements/expectations for filesize seem to be:

  • It will only be added for the "full" image size. This may not be the uploaded/original image file if the image was resized on uploading.
  • It may be missing or unreliable (if the image was optimized by a plugin after uploading), and will be used primarily as cache when displaying image info in wp_admin.
  • May be good to have some sort of a "backfill" functionality when filesize is missing. However that should happen "very sensibly". DB writes should be optimized (single write per page load) and happen only from wp-admin.
Last edited 3 years ago by azaozz (previous) (diff)

This ticket was mentioned in Slack in #core-media by spacedmonkey. View the logs.


3 years ago

#32 @antpb
3 years ago

  • Milestone changed from Future Release to 5.9

5.9 seems a great target to work towards this change. Adding to the milestone so we keep track of it in regular meetings.

This ticket was mentioned in Slack in #core-restapi by spacedmonkey. View the logs.


2 years ago

This ticket was mentioned in Slack in #core-media by antpb. View the logs.


2 years ago

#35 @antpb
2 years ago

  • Milestone changed from 5.9 to Future Release

Hey hi! Looks like this will need to be moved to a future release considering the 5.9 deadline on Nov 9th. If anyone feels strongly otherwise, feel free to move this back in the milestone. :D

This ticket was mentioned in PR #2075 on WordPress/wordpress-develop by adamsilverstein.


2 years ago
#36

  • Keywords needs-refresh removed

Trac ticket:

#37 @adamsilverstein
2 years ago

49412.3.diff is a refresh against truck, resolving conflicts - hopefully correctly!

I also pushed to a draft PR in order to run all tests. Looks like there is still quiet a bit of feedback on the original PR to address.

@azaozz summarized the goals agreed upon for this ticket above.

This ticket was mentioned in Slack in #core-media by adamsilverstein. View the logs.


2 years ago

This ticket was mentioned in Slack in #core-media by joedolson. View the logs.


2 years ago

This ticket was mentioned in PR #2367 on WordPress/wordpress-develop by spacedmonkey.


2 years ago
#40

Trac ticket:

This ticket was mentioned in PR #2388 on WordPress/wordpress-develop by spacedmonkey.


2 years ago
#41

Trac ticket:

#42 @spacedmonkey
2 years ago

  • Owner changed from johnwatkins0 to spacedmonkey

#43 @spacedmonkey
2 years ago

I spent some time on this ticket today. Updated the patch #2367. I think the patch is good to merge. I will wait a day for feedback, other, I will merged.

This ticket was mentioned in Slack in #core-media by spacedmonkey. View the logs.


2 years ago

#45 @spacedmonkey
2 years ago

  • Milestone changed from Future Release to 6.0

#46 @adamsilverstein
2 years ago

  • Keywords commit added

+1 looks good!

spacedmonkey commented on PR #2367:


2 years ago
#47

The return value of wp_get_attachment_metadata() will also need to be updated to include the filesize attribute and a warning to developers that it may not be present and that a falsey value could mean either zero or null.

For audio and video, filesize can be set, meaning it was always the case that filesize could be present or not. This PR is about ensuring it always exists, making things less confusing. I think there should be a dev note for this one maybe.

Are any changes needed to WP_REST_Attachments_Controller::prepare_item_for_response()'s source and/or documentation? Same for the XML RPC API?

Not that I know of. There is a ticket for this already here. Not in scope of what we are doing here.

peterwilsoncc commented on PR #2367:


2 years ago
#48

For audio and video, filesize can be set, meaning it was always the case that filesize could be present or not. This PR is about ensuring it always exists, making things less confusing. I think there should be a dev note for this one maybe.

Cool, are you able to add it to the return type of wp_get_attachment_metadata(). It's outside the diff so I can't add a suggestion.

spacedmonkey commented on PR #2367:


2 years ago
#49

Cool, are you able to add it to the return type of wp_get_attachment_metadata(). It's outside the diff so I can't add a suggestion.

@peterwilsoncc I don't know what you mean here. Can you provide some more details.

This is what the comment looks like at the moment. https://github.com/WordPress/wordpress-develop/blob/ee201606fb31bf5bcd667155ab1ae29824699ddd/src/wp-includes/post.php#L6540 .

There are lots of other fields in metadata that are not documented. See these examples.
https://i0.wp.com/user-images.githubusercontent.com/237508/157556045-54b45a4c-cffa-4bd5-9378-a5e9e5f50bb9.png
https://i0.wp.com/user-images.githubusercontent.com/237508/157556038-a1cf0236-4d72-4abe-a4ed-3128355859ce.png

#50 @spacedmonkey
2 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 52837:

Media: Store attachment’s file size in metadata.

Store the file size of all newly uploaded attachments, as part of the metadata stored in post meta. Storing file size means, developers will not have to resort to doing filesize function calls, that can be time consuming on assets on offloaded to services like Amazon’s S3.

This change also introduces a new helper function called, wp_filesize. This is a wrapper around the filesize php function, that adds some helpful filters and ensures the return value is an integer.

Props Cybr, Spacedmonkey, SergeyBiryukov, johnwatkins0, swissspidy, desrosj, joemcgill, azaozz, antpb, adamsilverstein, uday17035.
Fixes #49412.

This ticket was mentioned in Slack in #core-test by spacedmonkey. View the logs.


2 years ago

hellofromtonya commented on PR #671:


2 years ago
#52

Closing in favor of PR #2367 which was committed via changeset https://core.trac.wordpress.org/changeset/52837.

hellofromtonya commented on PR #2075:


2 years ago
#53

Closing in favor of PR #2367 which was committed via changeset https://core.trac.wordpress.org/changeset/52837.

#55 @dlh
2 years ago

Apologies if I missed it, but I'm curious what the reason is for using the error suppression operator with filesize() in [52837]? The two calls to filesize() that were replaced with wp_filesize() didn't use it, and it seems to go against other efforts such as #24780.

#56 @desrosj
23 months ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

Reopening to get some clarity on comment:55:ticket:49412.

#57 @peterwilsoncc
23 months ago

  • Keywords has-patch has-unit-tests commit removed

This ticket was mentioned in Slack in #core by chaion07. View the logs.


23 months ago

#60 @chaion07
23 months ago

  • Keywords close added

Thanks @Cybr for reporting this. We reviewed this ticket during a recent bug-scrub session for WP 6.0 and based on the feedback we're making the following changes:

  1. Add keyword 'close' since the ticket has had keywords removed but has commits that haven't been reverted
  2. Add keyword 'dev-feedback' so that we can have more eyes on the ticket for a clarification on where the ticket currently stands with patch and all.

Props to @costdev

Cheers!

#61 @Cybr
23 months ago

filesize() generates an E_WARNING in case of an error (source). The @ operator would suppress that. I see that operator is primarily used when handling files in WP; whether or not justified is concerned in #24780, as mentioned in comment:55.

If we agree not using the Error Suppression operator, it'd be better to use
$size = file_exists( $path ) ? (int) filesize( $path ) : 0;
Casting to (int) is still required because PHP returns false on error, for whatever reason that might happen.

#62 @costdev
23 months ago

  • Keywords close removed
  • Resolution set to fixed
  • Status changed from reopened to closed

With 6.0 RC1 starting soon and this ticket needing discussion about removing the error suppression, I'm going to close this one out and suggest the discussion takes place in a follow-up ticket.

This ticket was mentioned in Slack in #core-media by antpb. View the logs.


23 months ago

#65 @peterwilsoncc
23 months ago

In 53372:

Media: Remove error suppression in wp_filesize().

Replace error suppressing in wp_filesize() with a file_exists() check before calling the native PHP filesize() function.

Follow up to [52837].

Props Cybr, johnbillion, spacedmonkey, antpb, azouamauriac, ironprogrammer, mukesh27, costdev, audrasjb, dlh.
Fixes #55678.
See #49412.

#66 @SergeyBiryukov
23 months ago

In 53380:

Media: Remove error suppression in wp_filesize().

Replace error suppressing in wp_filesize() with a file_exists() check before calling the native PHP filesize() function.

Follow up to [52837].

Props Cybr, johnbillion, spacedmonkey, antpb, azouamauriac, ironprogrammer, mukesh27, costdev, audrasjb, dlh, peterwilsoncc.
Merges [53372] to the 6.0 branch.
Fixes #55678.
See #49412.

Note: See TracTickets for help on using tickets.