WordPress.org

Make WordPress Core

Opened 10 months ago

Last modified 4 weeks ago

#49412 assigned feature request

Store filesize media metadata

Reported by: Cybr Owned by: johnwatkins0
Milestone: Future Release Priority: normal
Severity: normal Version:
Component: Media Keywords: has-patch has-unit-tests
Focuses: performance Cc:

Description

Previous: #33214, comment:6:ticket:47459

When storing media files, I think it'd be helpful that the result of filesize( $file ) is stored with the media's metadata. Whereafter we can retrieve it via wp_get_attachment_metadata().

This way, we no longer have to touch the file and can work from the database-provided metadata instead. There's performance overhead involved when media is stored on slow/shared storage or even on another server.

Attachments (2)

49412.diff (12.2 KB) - added by johnwatkins0 10 months ago.
49412.2.diff (12.2 KB) - added by johnwatkins0 10 months ago.
Removed a line of debug code

Download all attachments as: .zip

Change History (23)

#1 @SergeyBiryukov
10 months ago

  • Keywords needs-patch added
  • Milestone changed from Awaiting Review to Future Release

#2 @SergeyBiryukov
10 months ago

For some context, WordPress does read the size from the filesize meta value in a few places, but it's only set for audio and video files from the data extracted by getID3() library. Related: comment:4:ticket:33214.

Some history:

It would indeed be beneficial to store the file size in metadata for any attachment.

#3 @johnwatkins0
10 months ago

I've been working on this. I agree it's a useful idea. I've seen and created plenty of workarounds for the fact that filesize data isn't already saved with images. It's especially tricky where files are stored remotely.

The only question I have is whether we think it would be useful to have the file size for all intermediate image sizes as well as the full-sized image, or just for the full-sized image?

I think it makes sense to store all of them, since the main reasons I can think of for having this data involve comparing filesizes -- e.g., when looping through images to find the largest one under a limit.

The drawbacks to storing all are (1) more filesize calls at the time an image is uploaded, and (2) inflated metadata.

I'm going ahead with the assumption that we want all sizes.

Last edited 10 months ago by johnwatkins0 (previous) (diff)

#4 @johnwatkins0
10 months ago

Attaching a diff that adds "filesize" to the metadata for all attachments. For images, the field is at the top level as well as in all of the sizes in the sizes array -- e.g.:

Array
(
    [width] => 426
    [height] => 640
    [file] => 2020/02/2004-07-22-DSC_0007.jpg
    [filesize] => 87348
    [sizes] => Array
        (
            [medium] => Array
                (
                    [file] => 2004-07-22-DSC_0007-200x300.jpg
                    [filesize] => 26505
                    [width] => 200
                    [height] => 300
                    [mime-type] => image/jpeg
                )

            [thumbnail] => Array
                (
                    [file] => 2004-07-22-DSC_0007-150x150.jpg
                    [filesize] => 19138
                    [width] => 150
                    [height] => 150
                    [mime-type] => image/jpeg
                )

        )

    [image_meta] => Array
        (
            ... etc.
        )

)

_wp_attachment_metadata will be created as needed -- e.g.:

Array
(
    [_wp_attached_file] => Array
        (
            [0] => 2020/02/2019-my-file.pptx
        )

    [_wp_attachment_metadata] => Array
        (
            [0] => a:1:{s:8:"filesize";i:8419858;}
        )

    [_edit_lock] => Array
        (
            [0] => 1582035593:1
        )

)

Video and audio files will still have their filesize read by getID3() if possible, but there is a fallback if for whatever reason getID3() doesn't provide the filesize.

Question

The key in the meta array should be carefully considered. I'm using filesize because it matches the PHP function and the convention I've seen in plugins and themes that store file size in meta. There's no reason it has to be this, though. If there's a more semantic key -- or perhaps one that is sure to be unique and not conflict with plugins that may hook into attachment metadata -- then it could easily be changed. Any thoughts?

Last edited 9 months ago by johnwatkins0 (previous) (diff)

@johnwatkins0
10 months ago

@johnwatkins0
10 months ago

Removed a line of debug code

#5 @johnwatkins0
10 months ago

  • Keywords has-patch added; needs-patch removed

#6 @SergeyBiryukov
5 weeks ago

#51650 was marked as a duplicate.

#7 @spacedmonkey
5 weeks ago

I just reviewed @johnwatkins0 patch. It look great to me.

The place that I feel was missed was the _wp_image_meta_replace_original function.

Otherwise, this patch is good to me.

As I noted in #51650

Many popular plugins like Yoast, ACF and Jetpack, use the function filesize to get the file size of a attachment.

So I would love to see this land in core.

#8 @swissspidy
5 weeks ago

since filesize can return false, any objections to casting the return value to an int (so false becomes 0)?

#9 @spacedmonkey
5 weeks ago

since filesize can return false, any objections to casting the return value to an int (so false becomes 0)?

+1 to this.

Also, this change takes effect on new / updated attachments, should we try to backfill this data on existing attachments?

#10 @johnwatkins0
5 weeks ago

Good suggestions. I'll make an update this week.

This ticket was mentioned in Slack in #core-media by spacedmonkey. View the logs.


5 weeks ago

This ticket was mentioned in PR #671 on WordPress/wordpress-develop by johnwatkins0.


5 weeks ago

  • Keywords has-unit-tests added

#13 @prbot
4 weeks ago

johnwatkins0 commented on PR #671:

The remaining test failures seem to be unrelated to my change. Will check back tomorrow.

#14 @johnwatkins0
4 weeks ago

@spacedmonkey @swissspidy I have updated via the PR linked above. Unexpectedly, I discovered that filesize was returning slightly different results in the different GHA test environments. I struggled with this a bit and ended up creating a test utility function to round file sizes in the tests to the nearest thousand bytes. I think this is okay in principle because we don't care about the exact sizes of the test files as long as the actual and expected results generally match, but I still wonder if there might be a simpler approach. LMK if you have any suggestions.

There remain some test failures on my PR, but I see the same failures on the latest commit to master. I'll check back on this tomorrow.

#15 @Cybr
4 weeks ago

Unexpectedly, I discovered that filesize was returning slightly different results in the different GHA test environments.

Perhaps clearstatcache will do the trick before obtaining the filesize; although, I'm not sure if this is needed outside a testing environment.

#16 @johnwatkins0
4 weeks ago

On revisiting, I was able to work around that whole issue by simply recalculating all the file sizes on the fly within the tests: https://github.com/WordPress/wordpress-develop/pull/671/commits/1773075b444655be65022d6371c9a2beafa180c0
No need for rounding.

Last edited 4 weeks ago by johnwatkins0 (previous) (diff)

#17 @prbot
4 weeks ago

spacedmonkey commented on PR #671:

💡 Idea

How about we add a new function called wp_filesize

function wp_filesize( $path ) {
     $size = (int) filesize( $path );
     return apply_filters( 'wp_filesize', $size, $path );
}

This has a number of benefits.

  • Saves doing casting to int everywhere.
  • Has a filter.
  • Improves support for plugins like S3 plugins that outsource files to third party file store.
  • We can add a file_exists check in this function to stop errors.

Props for the original idea @garretthyder

### Existing uses of key in core.

It is also worth noting that a couple of places in core already expect this key to exists.
See
https://github.com/WordPress/wordpress-develop/blob/7303470ecd4f736d13e354998ec885689fe5ec2e/src/wp-admin/includes/media.php#L3100
https://github.com/WordPress/wordpress-develop/blob/33caf61b8ba988da73c2e210309566f9fdb039be/src/wp-includes/media.php#L3271

These are two places we are going to have to test. But I wonder if we should trigger a meta data regeneration if the meta key doesn't exist in these places, so the next time, this screen is called, it has a "cached" version in post meta.

#18 @spacedmonkey
4 weeks ago

@johnwatkins0 PR is looking great. Added my thoughts here.

#19 @spacedmonkey
4 weeks ago

  • Owner set to johnwatkins0
  • Status changed from new to assigned

#20 @prbot
4 weeks ago

johnwatkins0 commented on PR #671:

How about we add a new function called wp_filesize

Great idea, @spacedmonkey. I'll add this later today.

a couple of places in core already expect this key to exists.

Interesting -- this makes me curious why those are there. I'll take a closer look.

#21 @prbot
4 weeks ago

johnwatkins0 commented on PR #671:

Hey, @spacedmonkey, I've pushed an update implementing your suggestion, let me know what you think.

Note: See TracTickets for help on using tickets.