#49412 closed feature request (fixed)
Store filesize media metadata
Reported by: | Cybr | Owned by: | spacedmonkey |
---|---|---|---|
Milestone: | 6.0 | Priority: | normal |
Severity: | normal | Version: | |
Component: | Media | Keywords: | has-dev-note |
Focuses: | performance | Cc: |
Description
Previous: #33214, comment:6:ticket:47459
When storing media files, I think it'd be helpful that the result of filesize( $file )
is stored with the media's metadata. Whereafter we can retrieve it via wp_get_attachment_metadata().
This way, we no longer have to touch the file and can work from the database-provided metadata instead. There's performance overhead involved when media is stored on slow/shared storage or even on another server.
Attachments (3)
Change History (69)
#1
@
5 years ago
- Keywords needs-patch added
- Milestone changed from Awaiting Review to Future Release
#3
@
5 years ago
I've been working on this. I agree it's a useful idea. I've seen and created plenty of workarounds for the fact that filesize data isn't already saved with images. It's especially tricky where files are stored remotely.
The only question I have is whether we think it would be useful to have the file size for all intermediate image sizes as well as the full-sized image, or just for the full-sized image?
I think it makes sense to store all of them, since the main reasons I can think of for having this data involve comparing filesizes -- e.g., when looping through images to find the largest one under a limit.
The drawbacks to storing all are (1) more filesize calls at the time an image is uploaded, and (2) inflated metadata.
I'm going ahead with the assumption that we want all sizes.
#4
@
5 years ago
Attaching a diff that adds "filesize" to the metadata for all attachments. For images, the field is at the top level as well as in all of the sizes in the sizes
array -- e.g.:
Array ( [width] => 426 [height] => 640 [file] => 2020/02/2004-07-22-DSC_0007.jpg [filesize] => 87348 [sizes] => Array ( [medium] => Array ( [file] => 2004-07-22-DSC_0007-200x300.jpg [filesize] => 26505 [width] => 200 [height] => 300 [mime-type] => image/jpeg ) [thumbnail] => Array ( [file] => 2004-07-22-DSC_0007-150x150.jpg [filesize] => 19138 [width] => 150 [height] => 150 [mime-type] => image/jpeg ) ) [image_meta] => Array ( ... etc. ) )
_wp_attachment_metadata
will be created as needed -- e.g.:
Array ( [_wp_attached_file] => Array ( [0] => 2020/02/2019-my-file.pptx ) [_wp_attachment_metadata] => Array ( [0] => a:1:{s:8:"filesize";i:8419858;} ) [_edit_lock] => Array ( [0] => 1582035593:1 ) )
Video and audio files will still have their filesize read by getID3()
if possible, but there is a fallback if for whatever reason getID3()
doesn't provide the filesize.
Question
The key in the meta array should be carefully considered. I'm using filesize
because it matches the PHP function and the convention I've seen in plugins and themes that store file size in meta. There's no reason it has to be this, though. If there's a more semantic key -- or perhaps one that is sure to be unique and not conflict with plugins that may hook into attachment metadata -- then it could easily be changed. Any thoughts?
#7
@
4 years ago
I just reviewed @johnwatkins0 patch. It look great to me.
The place that I feel was missed was the _wp_image_meta_replace_original function.
Otherwise, this patch is good to me.
As I noted in #51650
Many popular plugins like Yoast, ACF and Jetpack, use the function filesize to get the file size of a attachment.
So I would love to see this land in core.
#8
@
4 years ago
since filesize
can return false, any objections to casting the return value to an int
(so false
becomes 0
)?
#9
@
4 years ago
since filesize can return false, any objections to casting the return value to an int (so false becomes 0)?
+1 to this.
Also, this change takes effect on new / updated attachments, should we try to backfill this data on existing attachments?
This ticket was mentioned in Slack in #core-media by spacedmonkey. View the logs.
4 years ago
This ticket was mentioned in PR #671 on WordPress/wordpress-develop by johnwatkins0.
4 years ago
#12
- Keywords has-unit-tests added
Trac ticket: https://core.trac.wordpress.org/ticket/49412
johnwatkins0 commented on PR #671:
4 years ago
#13
The remaining test failures seem to be unrelated to my change. Will check back tomorrow.
#14
@
4 years ago
@spacedmonkey @swissspidy I have updated via the PR linked above. Unexpectedly, I discovered that filesize
was returning slightly different results in the different GHA test environments. I struggled with this a bit and ended up creating a test utility function to round file sizes in the tests to the nearest thousand bytes. I think this is okay in principle because we don't care about the exact sizes of the test files as long as the actual and expected results generally match, but I still wonder if there might be a simpler approach. LMK if you have any suggestions.
There remain some test failures on my PR, but I see the same failures on the latest commit to master
. I'll check back on this tomorrow.
#15
@
4 years ago
Unexpectedly, I discovered that
filesize
was returning slightly different results in the different GHA test environments.
Perhaps clearstatcache will do the trick before obtaining the filesize; although, I'm not sure if this is needed outside a testing environment.
#16
@
4 years ago
On revisiting, I was able to work around that whole issue by simply recalculating all the file sizes on the fly within the tests: https://github.com/WordPress/wordpress-develop/pull/671/commits/1773075b444655be65022d6371c9a2beafa180c0
No need for rounding.
spacedmonkey commented on PR #671:
4 years ago
#17
💡 Idea
How about we add a new function called wp_filesize
function wp_filesize( $path ) { $size = (int) filesize( $path ); return apply_filters( 'wp_filesize', $size, $path ); }
This has a number of benefits.
- Saves doing casting to int everywhere.
- Has a filter.
- Improves support for plugins like S3 plugins that outsource files to third party file store.
- We can add a
file_exists
check in this function to stop errors.
Props for the original idea @garretthyder
### Existing uses of key in core.
It is also worth noting that a couple of places in core already expect this key to exists.
See
https://github.com/WordPress/wordpress-develop/blob/7303470ecd4f736d13e354998ec885689fe5ec2e/src/wp-admin/includes/media.php#L3100
https://github.com/WordPress/wordpress-develop/blob/33caf61b8ba988da73c2e210309566f9fdb039be/src/wp-includes/media.php#L3271
These are two places we are going to have to test. But I wonder if we should trigger a meta data regeneration if the meta key doesn't exist in these places, so the next time, this screen is called, it has a "cached" version in post meta.
johnwatkins0 commented on PR #671:
4 years ago
#20
How about we add a new function called
wp_filesize
Great idea, @spacedmonkey. I'll add this later today.
a couple of places in core already expect this key to exists.
Interesting -- this makes me curious why those are there. I'll take a closer look.
johnwatkins0 commented on PR #671:
4 years ago
#21
Hey, @spacedmonkey, I've pushed an update implementing your suggestion, let me know what you think.
This ticket was mentioned in Slack in #core-media by spacedmonkey. View the logs.
3 years ago
#23
@
3 years ago
Would it be beneficial to add caching for wp_filesize()
? I don't think the metadata will be backfilled to attachments unless wp_generate_attachment_metadata()
is called for that file, or they are run through thumbnail regeneration. having wp_cache_get()
/wp_cache_add()
could prevent the need for filesize()
calls there.
#24
follow-up:
↓ 25
@
3 years ago
Good seeing this effort move forward. For context, this idea has come up before as part of #36477 and also as a part of @azaozz's general optimization efforts (see Slack conversation).
One of the questions that has come up in the past (which I can't find record of, for some reason) is how to sites that rely on plugins or services that optimize images
in the background after they're uploaded so that WordPress isn't relying on inaccurate filesize data saved in metadata.
Reviewing progress here, I like the idea of a wp_fileszie()
wrapper and agree with @desrosj that adding caching to this function could help reduce the load on direct reads of the filesystem. For the benefit of sites that don't have a persistent object cache in place, I wonder if using transients for this would be useful and if doing so would reduce the need for explicit attachment metadata since wp_filesize()
could check for a cached value before executing a filesize read?
#25
in reply to:
↑ 24
@
3 years ago
- Keywords needs-refresh added
Replying to joemcgill:
One of the questions that has come up in the past (which I can't find record of, for some reason) is how to sites that rely on plugins or services that optimize images in the background after they're uploaded so that WordPress isn't relying on inaccurate
filesize
data saved in metadata.
Yeah that's the main drawback. Seems the best option is to rely on these plugins to do-the-right-thing and update the file size after post-processing.
I wonder if using transients for this (caching of filesize) would be useful and if doing so would reduce the need for explicit attachment metadata since
wp_filesize()
could check for a cached value before executing a filesize read?
That will solve the problem with plugins not updating image meta after post-processing images, and with missing filesize
. However transients should not be saved/updated from the front-end. There is the rule to never write to the DB from a front-end request for unidentified users. That can bring down busy sites quite easily.
There are other cases where images are stored remotely. Then updating the meta after post-processing or trying to get the file size at runtime may not work.
As far as I see the "assumptions" for the filesize
meta would be that:
- It's optional, does not exist for older files.
- May be incorrect.
Would it be useful to have it? Perhaps yes, depends... :) One use might be to drop image sub-sizes when generating srcset
. Another is to speed up loading of the media modal/library.
Any other usage that would still be acceptable after considering the above two points?
#26
@
3 years ago
However transients should not be saved/updated from the front-end
Agreed. My assumption was that we would only be priming this cache during image manipulation or via a background task, but good to make this expectation explicit.
#27
@
3 years ago
It is worth noting my personal interest in this ticket.
I noticed that in my logs, I was getting lots of calls S3 api. It turns out when looking at assets in the media library, the filesize lookup, resulted in an API request for every file. As filesize is displayed in media library in multiple places, these request rack up. It is also worth noting, if assets on being offloaded in other ways like jetpack, it is likely that this will also be an issue. Even assets stored "Local" on a network share, but have a slow response to get filesize call.
Add a filter to get filesize would be extremely useful for anyone loading assets. I think that wp_fileszie()
maybe useful in other parts of core, even if it was just used to help mocking in unit tests.
As for if the data is incorrect, that is up to developer to fix, but if they not fix filesize after processing, it likely that other metadata is wrong as well.
#29
@
3 years ago
After more discussion in Slack (https://wordpress.slack.com/archives/C02SX62S6/p1620500571129800), the requirements/expectations for filesize
seem to be:
- It will only be added for the "full" image size. This may not be the uploaded/original image file if the image was resized on uploading.
- It may be missing or unreliable (if the image was optimized by a plugin after uploading), and will be used primarily as cache when displaying image info in wp_admin.
- May be good to have some sort of a "backfill" functionality when
filesize
is missing. However that should happen "very sensibly". DB writes should be optimized (single write per page load) and happen only from wp-admin.
This ticket was mentioned in Slack in #core-media by spacedmonkey. View the logs.
3 years ago
#32
@
3 years ago
- Milestone changed from Future Release to 5.9
5.9 seems a great target to work towards this change. Adding to the milestone so we keep track of it in regular meetings.
This ticket was mentioned in Slack in #core-restapi by spacedmonkey. View the logs.
3 years ago
This ticket was mentioned in Slack in #core-media by antpb. View the logs.
3 years ago
#35
@
3 years ago
- Milestone changed from 5.9 to Future Release
Hey hi! Looks like this will need to be moved to a future release considering the 5.9 deadline on Nov 9th. If anyone feels strongly otherwise, feel free to move this back in the milestone. :D
This ticket was mentioned in PR #2075 on WordPress/wordpress-develop by adamsilverstein.
3 years ago
#36
- Keywords needs-refresh removed
Trac ticket:
#37
@
3 years ago
49412.3.diff is a refresh against truck, resolving conflicts - hopefully correctly!
I also pushed to a draft PR in order to run all tests. Looks like there is still quiet a bit of feedback on the original PR to address.
@azaozz summarized the goals agreed upon for this ticket above.
This ticket was mentioned in Slack in #core-media by adamsilverstein. View the logs.
3 years ago
This ticket was mentioned in Slack in #core-media by joedolson. View the logs.
3 years ago
This ticket was mentioned in PR #2367 on WordPress/wordpress-develop by spacedmonkey.
3 years ago
#40
Trac ticket:
This ticket was mentioned in PR #2388 on WordPress/wordpress-develop by spacedmonkey.
3 years ago
#41
Trac ticket:
#43
@
3 years ago
I spent some time on this ticket today. Updated the patch #2367. I think the patch is good to merge. I will wait a day for feedback, other, I will merged.
This ticket was mentioned in Slack in #core-media by spacedmonkey. View the logs.
3 years ago
spacedmonkey commented on PR #2367:
3 years ago
#47
The return value of wp_get_attachment_metadata() will also need to be updated to include the filesize attribute and a warning to developers that it may not be present and that a falsey value could mean either zero or null.
For audio and video, filesize can be set, meaning it was always the case that filesize could be present or not. This PR is about ensuring it always exists, making things less confusing. I think there should be a dev note for this one maybe.
Are any changes needed to WP_REST_Attachments_Controller::prepare_item_for_response()'s source and/or documentation? Same for the XML RPC API?
Not that I know of. There is a ticket for this already here. Not in scope of what we are doing here.
peterwilsoncc commented on PR #2367:
3 years ago
#48
For audio and video, filesize can be set, meaning it was always the case that filesize could be present or not. This PR is about ensuring it always exists, making things less confusing. I think there should be a dev note for this one maybe.
Cool, are you able to add it to the return type of wp_get_attachment_metadata()
. It's outside the diff so I can't add a suggestion.
spacedmonkey commented on PR #2367:
3 years ago
#49
Cool, are you able to add it to the return type of wp_get_attachment_metadata(). It's outside the diff so I can't add a suggestion.
@peterwilsoncc I don't know what you mean here. Can you provide some more details.
This is what the comment looks like at the moment. https://github.com/WordPress/wordpress-develop/blob/ee201606fb31bf5bcd667155ab1ae29824699ddd/src/wp-includes/post.php#L6540 .
There are lots of other fields in metadata that are not documented. See these examples.
This ticket was mentioned in Slack in #core-test by spacedmonkey. View the logs.
3 years ago
hellofromtonya commented on PR #671:
3 years ago
#52
Closing in favor of PR #2367 which was committed via changeset https://core.trac.wordpress.org/changeset/52837.
hellofromtonya commented on PR #2075:
3 years ago
#53
Closing in favor of PR #2367 which was committed via changeset https://core.trac.wordpress.org/changeset/52837.
hellofromtonya commented on PR #2367:
3 years ago
#54
Committed via changeset https://core.trac.wordpress.org/changeset/52837.
#56
@
2 years ago
- Resolution fixed deleted
- Status changed from closed to reopened
Reopening to get some clarity on comment:55:ticket:49412.
This ticket was mentioned in Slack in #core by chaion07. View the logs.
2 years ago
#60
@
2 years ago
- Keywords close added
Thanks @Cybr for reporting this. We reviewed this ticket during a recent bug-scrub session for WP 6.0 and based on the feedback we're making the following changes:
- Add keyword 'close' since the ticket has had keywords removed but has commits that haven't been reverted
- Add keyword 'dev-feedback' so that we can have more eyes on the ticket for a clarification on where the ticket currently stands with patch and all.
Props to @costdev
Cheers!
#61
@
2 years ago
filesize()
generates an E_WARNING
in case of an error (source). The @
operator would suppress that. I see that operator is primarily used when handling files in WP; whether or not justified is concerned in #24780, as mentioned in comment:55.
If we agree not using the Error Suppression operator, it'd be better to use
$size = file_exists( $path ) ? (int) filesize( $path ) : 0;
Casting to (int)
is still required because PHP returns false
on error, for whatever reason that might happen.
#62
@
2 years ago
- Keywords close removed
- Resolution set to fixed
- Status changed from reopened to closed
With 6.0 RC1 starting soon and this ticket needing discussion about removing the error suppression, I'm going to close this one out and suggest the discussion takes place in a follow-up ticket.
For some context, WordPress does read the size from the
filesize
meta value in a few places, but it's only set for audio and video files from the data extracted by getID3() library. Related: comment:4:ticket:33214.Some history:
It would indeed be beneficial to store the file size in metadata for any attachment.